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PATENT APPLICATION 
Attorney Docket No. 15966-556 ClPl (Cura 56 CIPl) 

POLYPEPTIDES AND POLYNUCLEOTmES ENCODING SAME 

5 RELATED APPLICATIONS 

This application is a continuation-in-part of USSN 09/619252 filed July 19, 2000, which 
claims priority to USSN 60/144,722, filed July 20, 1999, and USSN 60/167,785, filed November 
29, 1999; and is a continuation-in-part of USSN 60/276,994 filed March 19, 2001; USSN 
60/280898 filed April 2, 2001 ; USSN 60/332,241 filed November 14, 2001; USSN 60/288,062 
10 filed May 2, 2001; USSN 60/291,766 filed May 17, 2001; and USSN 60/314,007 filed August 
21, 2001. The contents of these applications are incorporated herein by reference in their 
entireties. 

FIELD OF THE INVENTION 

The invention relates to generally to polynucleotides and the polypeptides encoded 
15 thereby and more particularly to polynucleotides encoding polypeptides that cross one or more 
membranes in eukaryotic cells. 

BACKGROUND OF THE INVENTION 

Eukaryotic cells are subdivided by membranes into multiple, functionally-distinct compartments, . 
referred to as organelles. Many biologically important proteins are secreted from the cell after crossing 
20 multiple membrane-bound organelles. These proteins can often be identified by the presence of sequence 
motifs referred to as "sorting signals" in the protein, or in a precursor form of the protein. These sorting 
signals can also aid in targeting the proteins to their appropriate destination. 

One specific type of sorting signal is a signal sequence, which is also referred to as a signal 
peptide or leader sequence. This signal sequence, which can be present as an amino-terminal extension 
25 on a newly synthesized polypeptide. A signal sequence possesses the ability to "target" proteins to an 
organelle known as the endoplasmic reticulum (ER). 

The signal sequence takes part in an array of protein-protein and protein-lipid interactions that 
result in the translocation of a signal sequence-containing polypeptide through a channel within the ER. 
Following translocation, a membrane-bound enzyme, designated signal peptidase, liberates the mature 
30 protein from the signal sequence. 

Secreted and membrane-bound proteins are involved in many biologically diverse 
activities. Examples of known, secreted proteins include, e.g,^ insulin, interferon, interleukin, 
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transforming growth factor-P, human growth hormone, erythropoietin, and lymphokine. 
Only a limited number of genes encoding human membrane-bound and secreted proteins have 
been identified. 

Failure to thrive, nutritional edema, and hypoproteinemia with normal sweat electrolytes 
5 of 2 affected male infants reported by Townes et al (J. Pediat. 71: 220-224, 1967), could be 

treated by a protein hydrolysate diet. Morris and Fisher (Am. J. Dis. Child. 1 14: 203-208, 1967) 
reported an affected female who also had imperforate anus, a result of a defect in the synthesis of 
the enterokinase which activates proteolytic enzymes produced by the pancreas. Oral pancreatin 
represents a therapeutically successful form of enzyme replacement. Trypsin, like elastase is a 

10 member of the pancreatic family of serine proteases. MacDonald et al. (J. Biol. Chem. 257: 
9724-9732, 1982) reported nucleotide sequences of cDNAs representing 2 pancreatic rat 
trypsinogens. The trypsin gene is on mouse chromosome 6 (Honey et al., Somat. Cell Molec. 
Genet. 10: 369-376, 1984). Carboxypeptidase A and trypsin are a syntenic pair conserved in 
mouse and man. Emi et al. (Gene 41: 305-310, 1986) isolated cDNA clones for 2 major human 

15 trypsinogen isozymes from a pancreatic cDNA library. The deduced amino acid sequences had 
89% homology and the same number of amino acids (247), including a 15-amino acid signal 
peptide and an 8-amino acid activation peptide. Southern blot analysis of human genomic DNA 
with the cloned cDNA as a probe showed that the human trypsinogen genes constitute a family 
of more than 10. The gene encoding trypsin- 1 (TRYl) is also referred to as serine protease- 1 

20 (PRSSl). Rowen et al. (Science 272: 1755-1762, 1996) found that there are 8 trypsinogen genes 
embedded in the beta T-cell receptor locus or cluster of genes (TCRB) mapping to 7q35. In the 
685-kb DNA segment that they sequenced they found 5 tandemly arrayed 10-kb locus-specific . 
repeats (homology units) at the 3-prime end of the locus. These repeats exhibited 90 to 91% 
overall nucleotide similarity, and embedded within each is a trypsinogen gene. Alignment of 

25 pancreatic trypsinogen cDNAs with the germline sequences showed that these trypsinogen genes 
contain 5 exons that span approximately 3.6 kb. They denoted 8 trypsinogen genes Tl through 
T8 from 5-prime to 3-prime. Some of the trypsinogen genes are expressed in nonpancreatic 
tissues where their function is unknown. Rowen et al. (Science 272: 1755-1762, 1996) noted that 
the intercalation of the trypsinogen genes in the TCRB locus is conserved in mouse and chicken, 

30 suggesting shared functional or regulatory constraints, as has been postulated for genes in the 
major histocompatibility complex (such as class I, II, and III genes) that share similar long-term 
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organizational relationships. The gene of invention is a novel serine protease containing a trypsin 
domain but localized on chromosome 16. 

SUMMARY OF THE INVENTION 

The invention is based, in part, upon the discovery of novel nucleic acids and secreted 
5 polypeptides encoded thereby. The nucleic acids and polypeptides are collectively referred to 
herein as "SECP". 

Accordingly, in one aspect, the invention includes an isolated nucleic acid that encodes a 
SECP polypeptide, or a fragment, homolog, analog or derivative thereof. For example, the 
nucleic acid can encode a polypeptide at least 85% identical to a polypeptide comprising the 
10 amino acid sequences of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 
and 57. The nucleic acid can be, e.g., a genomic DNA fragment, cDNA molecule. In some 
embodiments, the nucleic acid includes the sequence the invention provides an isolated nucleic 
acid molecule that includes the nucleic acid sequence of any of SEQ ID NO: 1 , 3, 5, 7, 9, 11, 13, 
15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56. 

15 Also included within the scope of the invention is a vector containing one or more of the 

nucleic acids described herein, and a cell containing the vectors or nucleic acids described 
herein. 

The invention is also directed to host cells transformed with a vector comprising any of 
the nucleic acid molecules described above. 

20 In another aspect, the invention includes a pharmaceutical composition that includes a 

SECP nucleic acid and a pharmaceutically acceptable carrier or diluent. 

In a further aspect, the invention includes a substantially purified SECP polypeptide, e.g., 
any of the SECP polypeptides encoded by a SECP nucleic acid, and fragments, homologs, 
analogs, and derivatives thereof. The invention also includes a pharmaceutical composition that 
25 includes a SECP polypeptide and a pharmaceutically acceptable carrier or diluent. 

In a still a further aspect, the invention provides an antibody that binds specifically to a 
SECP polypeptide. The antibody can be, e.g., a monoclonal or polyclonal antibody, and 
fragments, homologs, analogs, and derivatives thereof. The invention also includes a 
pharmaceutical composition including SECP antibody and a pharmaceutically acceptable carrier 
30 or diluent. The invention is also directed to isolated antibodies that bind to an epitope on a 
polypeptide encoded by any of the nucleic acid molecules described above. 

3 
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The invention also includes kits comprising any of the pharmaceutical compositions 
described above. 

The invention further provides a method for producing a SECP polypeptide by providing 
a cell containing a SECP nucleic acid, e.g., a vector that includes a SECP nucleic acid, and 
5 culturing the cell under conditions sufficient to express the SECP polypeptide encoded by the 
nucleic acid. The expressed SECP polypeptide is then recovered from the cell. Preferably, the 
cell produces little or no endogenous SECP polypeptide. The cell can be, e.g., a prokaryotic cell 
or eukaryotic cell. 

The invention is also directed to methods of identifying a SECP polypeptide or nucleic 
10 acids in a sample by contacting the sample with a compound that specifically binds to the 
polypeptide or nucleic acid, and detecting complex formation, if present. 

The invention further provides methods of identifying a compound that modulates the 
activity of a SECP polypeptide by contacting SECP polypeptide with a compound and 
determining whether the SECP polypeptide activity is modified. 

15 The invention is also directed to compounds that modulate SECP polypeptide activity 

identified by contacting a SECP polypeptide with the compound and determining whether the 
compound modifies activity of the SECP polypeptide, binds to the SECP polypeptide, or binds to 
a nucleic acid molecule encoding a SECP polypeptide. 

In a another aspect, the invention provides a method of determining the presence of or . 

20 predisposition of a SECP-associated disorder in a subject. The method includes providing a 
sample from the subject and measuring the amount of SECP polypeptide in the subject sample. 
The amount of SECP polypeptide in the subject sample is then compared to the amount of SECP 
polypeptide in a control sample. An alteration in the amount of SECP polypeptide in the subject 
protein sample relative to the amount of SECP polypeptide in the control protein sample 

25 indicates the subject has a tissue proliferation-associated condition. A control sample is 
preferably taken from a matched individual, i.e., an individual of similar age, sex, or other 
general condition but who is not suspected of having a tissue proliferation-associated condition. 
Alternatively, the control sample may be taken from the subject at a time when the subject is not 
suspected of having a tissue proliferation-associated disorder. In some embodiments, the SECP 

30 is detected using a SECP antibody. 

In a further aspect, the invention provides a method of determining the presence of or 
predisposition of a SECP-associated disorder in a subject. The method includes providing a 



nucleic acid sample (e.g., RNA or DNA, or both) from the subject and measuring the amount of 
the SECP nucleic acid in the subject nucleic acid sample. The amount of SECP nucleic acid 
sample in the subject nucleic acid is then compared to the amount of a SECP nucleic acid in a 
- control sample. An alteration in the amount of SECP nucleic acid in the sample relative to the 
5 amount of SECP in the control sample indicates the subject has a tissue proliferation-associated 
disorder. 

In a still further aspect, the invention provides method of treating or preventing or 
delaying a SECP-associated disorder. The method includes administering to a subject in which 
such treatment or prevention or delay is desired a SECP nucleic acid, a SECP polypeptide, or a 
10 SECP antibody in an amount sufficient to treat, prevent, or delay a tissue proliferation-associated 
disorder in the subject. 

Unless otherwise defined, all technical and scientific terms used herein have the same 
meaning as commonly understood by one of ordinary skill in the art to which this invention 
belongs. Although methods and materials similar or equivalent to those described herein can be 
15 used in the practice or testing of the invention, suitable methods and materials are described 

below. All publications, patent applications, patents, and other references mentioned herein are 
incorporated by reference in their entirety. In the case of conflict, the present Specification, 
including definitions, will control. In addition, the materials, methods, and examples are 
illustrative only and not intended to be limiting. 

20 Other features and advantages of the invention will be apparent from the following 

detailed description and claims. 

BRIEF DESCRIPTION OF THE FIGURES 
FIG. 1 is a representation of a SECP 1 nucleic acid sequence (SEQ ID NO^l) according 
to the invention, along with an amino acid sequence (SEQ ID NO:2) encoded by the nucleic acid 
25 sequence. 

FIG. 2 is a representation of a SECP 2 nucleic acid sequence (SEQ ID NO:3) according 
to the invention, along with an amino acid sequence (SEQ ID NO:4) encoded by the nucleic acid 
sequence. 

FIG. 3 is a representation of a SECP 3 nucleic acid sequence (SEQ ID NO:5) according 
30 to the invention, along with an amino acid sequence (SEQ ID NO:6) encoded by the nucleic acid 
sequence. 
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FIG. 4 is a representation of a SECP 4 nucleic acid sequence (SEQ ID NO:7) according 
to the invention, along with an amino acid sequence (SEQ ID NO:8) encoded by the nucleic acid 
sequence. 

FIG. 5 is a representation of a SECP 5 nucleic acid sequence (SEQ ID NO:9) according 
5 to the invention, along with an amino acid sequence (SEQ ID NO: 10) encoded by the nucleic 
acid sequence. 

FIG. 6 is a representation of a SECP 6 nucleic acid sequence (SEQ ID NO: 11) according 
to the invention, along with an amino acid sequence (SEQ ID NO: 12) encoded by the nucleic 
acid sequence. 

10 FIG. 7 is a representation of a SECP 7 nucleic acid sequence (SEQ ID NO: 13) according 

to the invention, along with an amino acid sequence (SEQ ID NO: 14) encoded by the nucleic 
acid sequence. 

FIG. 8 is a representation of a SECP 8 nucleic acid sequence (SEQ ED NO: 15) according 
to the invention, along with an amino acid sequence (SEQ ID NO: 16) encoded by the nucleic 
15 acid sequence. 

FIG. 9 is a representation of a SECP 9 nucleic acid sequence (SEQ ID NO: 17) according 
to the invention, along with an amino acid sequence (SEQ ID NO: 18) encoded by the nucleic 
acid sequence. 

FIG. 10 is a representation of an alignment of the proteins encoded by clones 
20 11618130.0.27 (SEQ ID NO:4) and 11618130.0.184 (SEQ ID NO:16). 

FIG. 1 1 is a representation of an alignment of the proteins encoded by clones 
14578444.0.143 (SECP4; SEQ ID NO:8) and 14578444.0.47 (SECP 5; SEQ ID NO: 10). 

FIG. 12 is a representation of a Western blot of a polypeptide expressed in 293 cells of a 
polynucleotide containing sequences encoded by clone 11618130, 

25 FIG. 13 is a representation of a Western blot of a polypeptide expressed in 293 cells of a 

polynucleotide containing sequence encoded by clone 16406477. 

FIG. 14 is a representation of a real-time expression analysis of the clones of the 
invention. 

FIG. 15 is a representation of a SECP 10 nucleic acid sequence (SEQ ID NO:40) 
30 according to the invention, along with an amino acid sequence (SEQ ID NO:41) encoded by the 
nucleic acid sequence. 
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FIG. 16 is a representation of a SECP 1 1 nucleic acid sequence (SEQ ID NO:42) 
according to the invention, along with an amino acid sequence (SEQ ID NO:43) encoded by the 
nucleic acid sequence. 

FIG. 17 is a representation of a SECP 12 nucleic acid sequence (SEQ ID NO:44) 
5 according to the invention, along with an amino acid sequence (SEQ ID NO:45) encoded by the 
nucleic acid sequence. 

FIG. 18 is a representation of a SECP 13 nucleic acid sequence (SEQ ID NO:46) 
according to the invention, along with an amino acid sequence (SEQ ID NO:47) encoded by the 
nucleic acid sequence. 

10 FIG. 19 is a representation of a SECP 14 nucleic acid sequence (SEQ ID NO:48) 

according to the invention, along with an amino acid sequence (SEQ ID NO:49) encoded by the 
nucleic acid sequence. 

FIG. 20 is a representation of a SECP 15 nucleic acid sequence (SEQ ID NO: 50) 
according to the invention, along with an amino acid sequence (SEQ ID NO:51) encoded by the 
15 nucleic acid sequence. 

FIG. 21 is a representation of a SECP 16 nucleic acid sequence (SEQ ID NO:52) 
according to the invention, along with an amino acid sequence (SEQ ID NO:53) encoded by the 
nucleic acid sequence. 

FIG. 22 is a representation of a SECP 17 nucleic acid sequence (SEQ ID NO:54) 
20 according to the invention, along with an amino acid sequence (SEQ ID NO:55) encoded by the 
nucleic acid sequence. 

FIG. 23 is a representation of a SECP 18 nucleic acid sequence (SEQ ID NO:56) 
according to the invention, along with an amino acid sequence (SEQ ID NO:57) encoded by the 
nucleic acid sequence. 
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DETAILED DESCRIPTION OF THE INVENTION 

The invention provides novel polynucleotides and the polypeptides encoded thereby. 
Included in the invention are ten novel nucleic acid sequences and their encoded polypeptides. 
These sequences are collectively referred to as "SECP nucleic acids" or "SECP polynucleotides" 
5 and the corresponding encoded polypeptide is referred to as a "SECP polypeptide" or "SECP 
protein". For example, a SECP nucleic acid according to the invention is a nucleic acid 
including a SECP nucleic acid, and a SECP polypeptide according to the invention is a 
polypeptide that includes the amino acid sequence of a SECP polypeptide. Unless indicated 
otherwise, "SECP" is meant to refer to any of the novel sequences disclosed herein. Each of the 
10 nucleic acid and amino acid sequences have been assigned a unique SECP Identification 
Number, with designations SECPl through SECPIO. 

TABLE 1 provides a cross-reference to the assigned SECP Number, Clone or Probe 
Identification Number, and Sequence Identification Number (SEQ ID NO:) for both the nucleic 
acid and encoded polypeptides of SECPl-14. 

15 TABLE 1 



CLONE/PROBE 


FIGURE 


SEQ ID NO: 
(Nucleic Acid) 


SEQ ID NO: 
(Polypeptide) 


21433858 


1 


1 


2 


11618130.0.27, also 
called CG508 17-03 


2 


3 


4 


11696905-0-47 


3 


5 


6 


14578444.0.143 


4 


7 


8 


14578444.0.47 


5 


9 


10 


14998905.0.65 


6 


11 


12 


16406477.0.206 


7 


13 


14 


11618130.0.184 


8 


15 


16 


21637262.0.64 


9 


17 


18 


CG106318-01 


15 


40 


41 


CG508 17-04 


16 


42 


43 


CG508 17-05 


17 


44 


45 


CG508 17-06 


18 


46 


47 


CG5 1099-03 


19 


48 


49 


CG57051-04 


20 


50 


51 


CG57051-05 


21 


52 


53 


CG57051-02 


22 


54 


55 


CG57051-03 


23 


56 


57 


11618130 Forwaixl 




19 




11618130 Reverse 




20 




PSec-V5-His Forward 




21 




PSec-V5-His Reverse 




22 




16406477 Forward 




23 




16406477 Reverse 




24 
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Ag 383 (F) 




25 




Ag 383 (R) 




26 




Ag 383 (P) 




27 




Ag 53 (F) 




28 




Ag 53 (R) 




29 




Ag 53 (P) 




30 




Ag 127 (F) 




31 




Ag 127 (R) 




32 




Ag 127 (P) 




33 




Ab 5(F) 




34 




Ab 5(R) 




35 




Ab 5(P) 




36 




Ag 815(F) 




37 




Ag815(R) 




38 




Ag815(P) 




39 





Nucleic acid sequences and polypeptide sequences for SECP nucleic acids and 
polypeptides, as disclosed herein, are provided in the following section of the Specification. 



SECP nucleic acids, and their encoded polypeptides, according to the invention are useful 
in a variety of applications and contexts. For example, various SECP nucleic acids and 
5 polypeptides according to the invention are useful, inter alia, as novel members of the protein 
families according to the presence of domains and sequence relatedness to previously described 
proteins. 

SECP nucleic acids and polypeptides according to the invention can also be used to 
identify cell types based on the presence or absence of various SECP nucleic acids according to 
10 the invention. Additional utilities for SECP nucleic acids and polypeptides are discussed below. 

SECPl 

A SECPl nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:l) and encoded polypeptide sequence (SEQ ID NO:2) of clone 
21433858. FIG. 1 illustrates the nucleic acid and amino acid sequences, as well as the alignment 
15 between these two sequences. 

This clone includes a nucleotide sequence (SEQ ID NO:l) of 6373 bp. The nucleotide 
sequence includes an open reading frame (ORF) encoding a polypeptide of 1588 amino acid 
residues (SEQ ID NO:2) with a predicted molecular weight of 178042.1 Dahons. The start 
codon is located at nucleotides 235-237 and the stop codon is located at nucleotides 4999-5001. 
20 The protein encoded by clone 21433858 is predicted by the PSORT program to localize in the 
plasma membrane with a certainty of 0.7300. The program SignalP predicts that there is a signal 



9 



peptide with the most probable cleavage site located between residues 23 and 24, in the sequence 
CMG-DE. 

Real-time gene expression analysis was performed on SECPl (clone 21433858). The 
results demonstrate that RNA sequences with homology to clone 21433858 are detected in 
5 various cell types. The relative abundance of RNA homologous to clone 21433858 is shown in 
FIG. 14 (see also Examples, below). Cell types endothelial cells (treated and untreated), 
pancreas, adipose, adrenal gland, thyroid, mammary gland, myometrium, uterus, placenta, 
prostate, testis, and in neoplastic cells derived from ovarian carcinoma OVCAR-3, ovarian 
carcinoma OVCAR-5, ovarian carcinoma OVCAR-8, ovarian carcinoma IGROV-1, ovarian 
10 carcinoma (ascites) SK-OV-3, breast carcinoma BT-549, prostate carcinoma (bone metastases) 
PC-3, Melanoma M14, and melanoma (met) SK-MEL-5. Accordingly, SECPl nucleic acids 
according to the invention can be used to identify one or more of these cell types. The presence 
of RNA sequences homologous to a SECPl nucleic in a sample indicates that the sample 
contains one or more of the above-cell types. 

15 A search of sequence databases using BLASTX reveals that residues 299-1588 of the 

polypeptide encoded clone 21433858 are 100% identical to the 1290 residue human KIAA0960 
protein (ACC: SPTREMBL-ACC:Q9UPZ6). In addition, the protein of clone 21433858 has 542 
of 543 residues (99%) identical to, and 543 of 543 residues (100%) positive with, the 543 residue 
fragment of a human hypothetical protein (SPTREMBL-ACC:O60407). 

20 The proteins of the invention encoded by clone 21433858 include the protein disclosed as 

being encoded by the ORF described herein, as well as any mature protein arising therefrom as a 
result of post-translational modifications. Thus, the proteins of the invention encompass both a 
precursor and any active forms of the clone 21433858 protein. 

SECP2 

25 A SECP2 nucleic acid and polypeptide according to the invention includes a nucleic acid 

sequence (SEQ ID NO:3) and an encoded polypeptide sequence (SEQ ID NO:4) of clone 
1 1618130.0.27, also called CG50817-03. FIG. 2 illustrates the nucleic acid sequence and amino 
acid sequence, as well as the alignment between these two sequences. 

This clone includes a nucleotide sequence (SEQ ID NO:3) of 1894 nucleotides. The 

30 nucleotide sequence includes an open reading frame (ORF) encoding a polypeptide of 267 amino 

acid residues with a predicted molecular weight of 28043 Daltons. The start codon is at 

nucleotides 732-734 and the stop codon is at nucleotides 1534-1536. The protein encoded by 

10 



clone 1 1618130.0.27 is predicted by the PSORT program to localize in the microbody 
(peroxisome) with a certainty of 0.5035. The program SignalP predicts that there is no signal 
peptide in the encoded polypeptide. 

A search of the sequence databases using BLAST P and BLASTX reveals that clone 
5 1 1618130.0.27 has 330 of 333 residues (99%) identical to and positive with a 571 residue human 
protein termed PR0351 (PCT Publication W09946281'A2 published September 16, 1999). In 
addition, it was found to have 83 of 250 residues (33%) identical to, and 1 19 of 250 residues 
(47%) positive with the 343 residue human prostasin precursor (EC 3.4.21.-) (SWISSPROT- 
ACC:Q16651). 

10 The proteins of the invention encoded by clone 1 1618130,0.27 includes the protein 

disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
therefrom as a result of post-translational modification. Thus, the protein of the invention 
encompasses both a precursor and any active forms of the 1 1618130.0.27 protein. 

SECP3 

15 A SECP3 nucleic acid and polypeptide according to the invention includes the nucleic 

acid sequence (SEQ ID NO:5) and encoded polypeptide sequence (SEQ ID NO:6) of clone 
11696905-0-47. FIG. 3 illustrates the nucleic acid sequence and amino acid sequence, as well as 
the alignment between these two sequences. 

Clone 1 1696905-0-47 was obtained from fetal brain. In addition, RNA sequences were 
20 also found to be present in tissues including, uterus, pregnant and non-pregnant uterus, ovarian 
tumor, placenta, bone marrow, hippocampus, synovial membrane, fetal heart, fetal lung, pineal 
gland and melanocytes. This clone includes a nucleotide sequence of 1855 bp (SEQ ID NO:5). 
The nucleotide sequence includes an open reading frame (ORF) encoding a polypeptide of 405 
amino acid residues (SEQ ID NO:6) with a predicted molecular weight of 44750 Daltons. The 
25 start codon is located at nucleotides 154-156 and the stop codon is located at nucleotides 1369- 
1371. The protein encoded by clone 1 1696905-0-47 is predicted by the PSORT program to 
localize extracellularly with a certainty of 0.7332. The program SignalP predicts that there is a 
signal peptide with the most probable cleavage site located between residues 25 and 26, in the 
sequence AQG-GP. 

30 Real-time gene expression analysis was performed on SECP3 (clone 1 1696905-0-47). 

The results demonstrate that RNA sequences homologous to clone 1 1696905-0-47 are detected 

in various cell types. Cell types include adipose, adrenal gland, thyroid, brain, heart, skeletal 

11 
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muscle, bone marrow, colon, bladder, liver, lung, mammary gland, placenta, and testis, and in 
neoplastic cells derived from renal carcinoma A498, lung carcinoma NCI-H460, and melanoma 
SK-MEL-28. 

Accordingly, SECP3 nucleic acids according to the invention can be used to identify one 
5 or more of these cell types. The presence of RNA sequences homologous to a SECP3 nucleic in 
a sample indicates that the sample contains one or more of the above-cell types. 

A search of the sequence databases using BLASTX reveals that clone 1 1696905-0-47 has 
403 of 405 residues (99%) identical to, and 404 of 405 residues (99%) positive with, the 405 
residue human angiopoietin-related protein (SPTREMBL-ACC:Q9Y5B3). Angiopoietin 

10 homologues are useful to stimulate cell growth and tissue development. The polypeptides of 
clone 11696905-0-47 tend to be found as multimeric proteins (see Example 7) and are believed 
to have angiogenic or hematopoietic activity. They can thus be used in assays for angiogenic 
activity, as well as used therapeutically to stimulate restoration of vascular structure in various 
tissues. Examples of such uses include, but are not limited to, treatment of full-thickness skin 

15 wounds, including venous stasis ulcers and other chronic, non-healing wounds, as well as 

fracture repair, skin grafting, reconstructive surgery, and establishment of vascular networks in 
transplanted cells and tissues. 

The proteins of the invention encoded by clone 1 1696905-0-47 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
20 therefrom as a result of post-translational modifications. Thus, the proteins of the inventix?n 
encompass both a precursor and any active forms of the clone 1 1696905-0-47 protein. 

SECP4 

A SECP4 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:7) and encoded polypeptide sequence (SEQ ID NO:8) of 
25 14578444.0.143. FIG. 4 illustrates the nucleic acid sequence and amino acid sequence, as well 
as the alignment between these two sequences. 

Clone 14578444.0.143 was obtained from fetal brain. This clone includes a nucleotide 

sequence (SEQ ID NO:7) of 3026 bp. The nucleotide sequence includes an open reading frame 

(ORF) encoding a polypeptide of 776 amino acid residues (SEQ ID NO: 8) with a predicted 

30 molecular weight of 86220.8 Daltons. The start codon is located at nucleotides 55-57 and the ' 

stop codon is located at nucleotides 2384-2386. The protein encoded by clone 14578444.0.143 

. is predicted by the PSORT program to localize in the endoplasmic reticulum (membrane) with a 

12 



certainty of 0.8200. The program SignalP predicts that there is a signal peptide with the most 
probable cleavage site located between residues 23 and 24 in the sequence AEA-RE. 

A search of the sequence databases using BLASTX reveals that clone 14578444.0.143 
has 655 of 757 residues (86%) identical to, and 702 of 757 residues (92%) positive with, the 956 
5 residue murine matrilin-2 precursor protein (SWISSPROT-ACC:O08746), extending over 

residues 1-754 of the reference protein. Additional similarities are found with lower identities in 
residues 649-837 of the murine protein. Additionally, the search shows that there is a lower 
degree of similarity to murine matrilin-4 precursor. The protein of clone 14578444.0.143 also 
has 595 of 606 residues (98%) identical to, and 598 of 606 residues (98%) positive with, the 632 
10 residue human matrilin-3 (PCT publication WO9904002-A1). 

The matrilin proteins and polynucleotides can be used for treating a variety of 
developmental disorders ie,g., renal tubular acidosis, anemia, Cushing's syndrome). The proteins 
can serve as targets for antagonists that should be of use in treating diseases related to abnormal 
vesicle trafficking. These may include, but are not limited to, diseases such as cystic fibrosis, 

15 glucose-galactose malabsorption syndrome, hypercholesterolaemia, diabetes mellitus, diabetes 
insipidus, hyper- and hypoglycemia, Graves disease, goiter, Cushing's disease, Addison's 
disease, gastrointestinal disorders including ulcerative coHtis, gastric and duodenal ulcers, and 
other conditions associated with abnormal vesicle trafficking including AIDS, and allergies 
including hay fever, asthma, and urticaria (hives), autoimmune hemolytic anemia, proliferative 

20 glomerulonephritis, inflammatory bowel disease, multiple sclerosis, myasthenia gravis, 

rheumatoid and osteoarthritis, scleroderma, Chediak-Higashi and Sjogren's syndromes, systemic 
lupus erythematosus, toxic shock syndrome, traumatic tissue damage, and viral, bacterial, 
fungal, helminth, protozoal infections, a neoplastic disorder (e.g., adenocarcinoma, leukemia, 
lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and cancers), or an immune 

25 disorder, {e.g., AIDS, Addison's disease, adult respiratory distress syndrome, allergies, anemia, 
asthma, atherosclerosis, bronchitis, cholecystitis, Crohn's disease and ulcerative colitis). 

The proteins of the invention encoded by clone 14578444.0.143 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
30 encompass both a precursor and any active forms of the proteins encoded by clone 
14578444.0.143 (SECP4). 
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SECP5 

A SECP5 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:9) and encoded polypeptide sequence (SEQ ID NO: 10) of clone 
14578444.0.47. FIG. 5 illustrates the nucleic acid sequence and amino acid sequence, as well as 
5 the alignment between these two sequences. 

Clone 14578444.0.47 was obtained from fetal brain. This clone includes a nucleotide 
sequence (SEQ ID NO:9) of 3447 bp. The nucleotide sequence includes an open reading frame 
(ORF) encoding a polypeptide of 959 amino acid residues (SEQ ED NO: 10) with a predicted 
molecular weight of 107144 Daltons. The start codon is located at nucleotides 55-57 and the 
10 stop codon is located at nucleotides 2933-2935. The protein encoded by clone 14578444.0.47 is 
predicted by the PSORT program to localize to the endoplasmic reticulum (membrane) with a 
certainty of 0.8200. The program SignalP predicts that there is a signal peptide with the most 
probable cleavage site located between residues 23 and 24 in the sequence AEA-RE. 

A search of the sequence databases using BLASTX reveals that clone 14578444.0.47 has 
15 829 of 959 residues (86%) identical to, and 887 of 959 residues (92%) positive with, the 956 
residue murine matrilin-2 precursor protein (ACC: SWISSPROT-ACC:O08746). The protein 
encoded by clone 14578444.0.47 also has 594 of 606 residues (98%) identical to, and 597 of 606 
residues (98%) positive with, the 632 residue human matrilin-'3 (PCT publication WO9904002). 
In addition, the protein encoded by clone 14578444.0.47 also has 616 of 678 residues (90%) 
20 identical to, and 632 of 678 residues (93%) positive with the 915 residue human protein PR0219 
(PCT publication W09914328-A2). 

The proteins encoded by clones 14578444.0. 143 (SECP4) and 14578444.0.47 (SECP5) 
are compared in an amino acid residue alignment shown in FIG. 1 1. It can be seen that the main 
portion of the two proteins starting with their amino-termini are virtually identical, and that short 
25 sequences in each corresponding to the carboxyl-terminal sequence of the shorter protein, clone 
14578444.0.143, differ from one another. Furthermore, clone 14578444.0.47 has an extended 
carboxyl-terminal sequence that is missing in clone 14578444.0.143. Therefore, clones 
14578444.0.143 (SECP4) and 14578444.0.47 (SECP5) are apparendy related to one another as 
splice variants, with respect to their sequences at the carboxyl-terminal ends. 

30 The matrilin proteins and polynucleotides can be used for treating a variety of 

developmental disorders (e.g., renal tubular acidosis, anemia, Cushing's syndrome). The proteins 
can serve as targets for antagonists that should be of use in treating diseases related to abnormal 
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vesicle trafficking. These may include, but are not limited to, diseases such as cystic fibrosis, 
glucose-galactose malabsorption syndrome, hypercholesterolaemia, diabetes mellitus, diabetes 
insipidus, hyper- and hypoglycemia, Graves disease, goiter, Cushing's disease, Addison's 
disease, gastrointestinal disorders including ulcerative colitis, gastric and duodenal ulcers, and 
5 other conditions associated with abnormal vesicle trafficking including AIDS, and allergies 
including hay fever, asthma, and urticaria (hives), autoimmune hemolytic anemia, proliferative 
glomerulonephritis, inflammatory bowel disease, multiple sclerosis, myasthenia gravis, 
rheumatoid and osteoarthritis, scleroderma, Chediak-Higashi and Sjogren's syndromes, systemic 
lupus erythematosus, toxic shock syndrome, traumatic tissue damage, and viral, bacterial, 
10 fungal, helminth, protozoal infections, a neoplastic disorder {e.g., adenocarcinoma, leukemia, 
lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and cancers), or an immune 
disorder, (e.g., AIDS, Addison's disease, adult respiratory distress syndrome, allergies, anemia, 
asthma, atherosclerosis, bronchitis, cholecystitis, Crohn's disease and ulcerative colitis). 

The proteins of the invention encoded by clone 14578444.0.47 include the protein 
15 disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the proteins encoded by clone 
14578444.0.47 (SECP5). 

SECP6 

20 A SECP6 nucleic acid and polypeptide according to the Invention includes the nucleic 

acid sequence (SEQ ID NO: 11) and encoded polypeptide sequence (SEQ ID NO: 12) of clone 
14998905.0.65. FIG. 6 illustrates the nucleic acid sequence and amino acid sequence, as well as 
the alignment between these two sequences. 

Clone 14998905.0.65 was obtained from lymphoid tissue, in particular, from the lymph 
25 node. This clone includes a nucleotide sequence (SEQ ID NO: 1 1) of 967 bp. The nucleotide 
sequence includes an open reading frame (ORF) encoding a polypeptide of 245 amino acid 
residues (SEQ ID NO: 12) with a predicted molecular weight of 27327.2 Daltons. The start 
codon is located at nucleotides 166-168 and the stop codon is located at nucleotides 902-904. 
The protein encoded by clone 14998905.0.65 is predicted by the PSORT program to localize in 
30 the microbody (peroxisome) with a certainty of 0.7480. PSORT predicts that there is no amino- 
terminal signal sequence. Conversely, the program SignalP predicts that there is a signal peptide 
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with the most probable cleavage site located between residues 20 and 21, in the sequence GIG- 
AE. 

A search of the sequence databases using BLASTX reveals that clone 14998905.0.65 has 
204 of 226 residues (90%) identical to, and 214 of 226 residues (94%) positive with, the 834 
5 residue murine semaphorin 4C precursor protein (SWISSPROT-ACC:Q64151). Semaphorin 4C 
is indicated as being a Type I membrane protein widely expressed in the nervous system during 
development. In addition, it contains one immunoglobulin-like C2-type domain. The protein 
encoded by clone 14998905.0.65 also has similarities to mouse CDIOO antigen (PCT publication 
W09717368-A1) and to human semaphorin (JP10155490-A). 

10 The proteins of the invention encoded by clone 14998905.0.65 include the protein 

disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the clone 14998905.0.65 protein. 

SECP7 

15 ^ A SECP7 nucleic acid and polypeptide according to the invention includes the nucleic 

acid sequence (SEQ ID NO: 13) and encoded polypeptide sequence (SEQ ID NO: 14) of clone 
16406477.0.206. FIG. 7 illustrates the nucleic acid sequence and amino acid sequence, as well 
as the alignment between these two sequences. 

Clone 16406477-0.206 was obtained from testis. In addition, sequences of clone 
20 16406477.0.206 were also found in an RNA pool derived from adrenal gland, mammary gland, 
prostate gland, testis, uterus, bone marrow, melanoma, pituitary gland, thyroid gland and spleen. 
This clone includes a nucleotide sequence (SEQ ID NO: 13) comprising of 1359 bp with an open 
reading frame (ORF) encoding a polypeptide of 385 amino acid residues (SEQ ID NO: 14) with a 
predicted molecular weight of 43087.3 Daltons. The start codon is located at nucleotides 45-47 
25 and the stop codon is located at nucleotides 1201-1203. The protein encoded by clone 

16406477.0.206 is predicted by the PSORT program to localize extracellularly with a certainty 
of 0.5804 and to have a cleavable amino-terminal signal sequence. The program SignalP 
predicts that there is a signal peptide with the most probable cleavage site located between 
residues 39 and 40, in the sequence CWG-AG. 

30 Real-time expression analysis was performed on SECP7 (clone 16406477.0.206). The 

results demonstrate that RNA homologous to this clone is found in multiple cell and tissue types. 

These cells and tissues include brain, mammary gland, and testis, and in neoplastic cells derived 
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from ovarian carcinoma OVCAR-3, ovarian carcinoma OVCAR-5, ovarian carcinoma OVCAR- 
8, ovarian carcinoma IGROV-1, breast carcinoma (pleural effusion) T47D, breast carcinoma BT 
549, melanoma M14. Real-time gene expression analysis was performed on SECP3 (clone 
1 1696905-0-47). The results demonstrate that RNA sequences homologous to clone 1 1696905- 
5 0-47 are detected in various cell types. Cell types include adipose, adrenal gland, thyroid, brain, 
heart, skeletal muscle, bone marrow, colon, bladder, liver, lung, mammary gland, placenta, and 
testis, and in neoplastic cells derived from renal carcinoma A498, lung carcinoma NCI-H460, 
and melanoma SK-MEL-28. 

Accordingly, SECP7 nucleic acids according to the invention can be used to identify one 
10 or more of these ceil types. Tiie presence of RNA sequences homologous to a SECP7 nucleic in 
a sample indicates that the sample contains one or more of the above-cell types. 

A search of the sequence databases using BLASTX reveals that clone 16406477.0.206 is 
100% identical to a human testis-specific protein TSP50 (SPTREMBL-ACC:Q9UI38) with a 
trypsin/chymotrypsin-like domain. In addition, the protein encoded by clone 16406477.0.206 
15 has low similarity to the 343 residue human prostasin precursor (EC 3.4.2L-) (SWISSPROT 
ACC:Q16651). 

The proteins of the invention encoded by clone 16406477.0.206 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
20 encompass both a precursor and any active foijhs of the clone 16406477.0.206 protein. 

SECP8 

A SECP8 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO: 15) and encoded polypeptide sequence (SEQ ID NO: 16) of clone 
11618130.0.184. FIG. 8 illustrates the nucleic acid sequence and amino acid sequence, as well 
25 as the alignment between these two sequences. 

Clone 11618130.0.184 includes a nucleotide sequence (SEQ ID NO:15) of 1445 bp. The 
nucleotide sequence includes an open reading frame (ORF) encoding a polypeptide of 198 amino 
acid residues (SEQ ID NO: 16) with a predicted molecular weight of 20659 Daltons. The start 
codon is located at nucleotides 732-734 and the stop codon is located at nucleotides 1326-1328. 
30 The protein encoded by clone 1 1618130.0.184 is predicted by the PSORT program to localize in 
the cytoplasm. The program SignalP predicts that there is no signal peptide. 
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Clones 11618130.0.184 (SECP8) and 11618130.0.27 (SECP2) resemble each other in 
that they are identical over most of their common sequences, and differ only at the carboxyl- 
terminal end. In addition, clone 1 1618130.0.27 extends further at the carboxyl-terminai end than 
does clone 11618130.0.184. An alignment of clones 11618130.0.27 and'11618130.0.184 is 
5 shown in FIG. 10. 

The proteins of the invention encoded by clone 11618130.0.184 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the 1 1618130.0.184 protein. 

10 SECP9 

A SECP9 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO: 17) and encoded polypeptide sequence (SEQ ID NO: 18) of clone 
21637262.0.64. FIG. 9 illustrates the nucleic acid sequence and amino acid sequence, as well as 
the alignment between these two sequences. 

15 Clone 21637262.0.64 was obtained from salivary gland. This clone includes a nucleotide 

sequence (SEQ ID NO: 17) of 1600 bp. The nucleotide sequence includes an open reading frame 
(ORF) encoding a polypeptide of435 amino acid residues (SEQ ID NO: 18) with a predicted 
molecular weight of 47162.5 Dal tons. The start codon is located at nucleotides 51-53 and the 
, stop codon is located at nucleotides 1356-1358. The protein encoded by clone 21637262.0.64 is 

20 predicted by the PSORT program to localize in the cytoplasm with a certainty of 0.4500. The 

program PSORT and program SignalP predict that the protein appears to have no amino-terminal 
signal sequence. 

Real-time expression analysis was performed on SECP9 (clone 21637262.0.64). The 
results demonstrate that RNA homologous to this clone is present in multiple tissue and cell 

25 types. The relative amounts of RNA in various cell types are shown in FIG. 14 (see also the 
Examples, below). The cells include myometrium, placenta, uterus, prostate, and testis, and 
neoplastic cells derived from breast carcinoma (pleural effusion) T47D, breast carcinoma 
(pleural effusion) MDA-MB-231, breast carcinoma BT-549, ovarian carcinoma OVCAR-3, 
ovarian carcinoma OVCAR-5, prostate carcinoma (bone metastases) PC-3, melanoma M14, and 

30 melanoma LOX IMVI. 
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Accordingly, SECP9 nucleic acids according to the invention can be used to identify one 
or more of these cell types. The presence of RNA sequences homologous to a SECP9 nucleic in 
a sample indicates that the sample contains one or more of the above-cell types. 

A search of the sequence databases using BLASTX reveals that clone 21637262.0.64 has 
5 23 of 420 residues (29%) identical to, and 201 of 420 residues (47%) positive with, the 1 130 
residue murine protein repetin (SWISSPROT-ACC:P97347). Repetin is a member of the "fused 
gene" subgroup within the SlOO gene family that is an epidermal differentiation protein. 

The proteins of the invention encoded by clone 21637262.0.64 include the protein 
disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
10 therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the clone 21637262.0.64 protein. 

SECPIO 

A SECPIO nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:40 and encoded polypeptide sequence (SEQ ID NO:41) of clone 

15 CG 1063 18. FIG. 15 illustrates the nucleic acid sequence and amino acid sequences. This clone 
includes a nucleotide sequence (SEQ ID NO:40) of 4810 bp. The nucleotide sequence includes 
an open reading frame (ORF) encoding a polypeptide of 1588 amino acid residues (SEQ ID 
NO:41). The start codon is located at nucleotides 18-21 and the stop codon is located at 
nucleotides 4782-4785. The protein encoded by clone CG106318-01 is predicted by the PSORT 

20 program to localize in the nucleus with a certainty of 0.3500. The program PSORT and program 
SignalP predict that the protein appears to have no amino-terminal signal sequence. 

Real-time expression analysis was performed on SECPIO (clone CG106318). The results 
demonstrate that RNA homologous to this clone is present in multiple tissue and cell types. 

Accordingly, SECPIO nucleic acids according to the invention can be used to identify 
25 one or more of these tissue types. The presence of RNA sequences homologous to a SECPIO 

nucleic acid in a sample indicates that the sample contains one or more of the above-tissue types. 

A search of the sequence databases using BLASTX reveals that clone CG 1063 18 has 
1587 out of 1588 (99.9%) of its residues identical to a human protein utilized in the treatment of 
central nervous system disorders (AAM39295 to HYSEQ INC.). 

30 The proteins of the invention encoded by clone CGI 063 18-01 include the protein 

disclosed as being encoded by the ORF described herein, as well as any mature protein arising 
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therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
encompass both a precursor and any active forms of the clone CGI 063 18-01 protein. 

PSORT — Prediction of Protein Translocation Sites version 5.8 



10 
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Results Summary: 

plasma membrane - 

nucleus - 

microbody (peroxisome) - 

endoplasmic reticulum (membrane) - 



-- Certainty=0 .7000 (Affirmative) < suco 

-- Certainty=0. 3500 (Affirmative) < suco 

-- Certainty=0 .3000 (Affirmative) < suco 

— Certainty=0. 2 000 (Affirmative) < suco 



PFAM Domain Analysis 
Query: 106318-01 

Scores for sequence famity classification (score includes all domains): 



Model DescriDtion 


Score 


E-value 


N 


tsp_1 Thrombospondin type 1 domain 


169.5 


5.4e-47 


11 


toxin Snake toxin 


-16.1 


1.3 


1 


DUF 1 8 Domain of unknown function DUF 1 8 


-55.9 


7.8 


1 


Keratin B2 Keratin, hiah sulfur B2 orotein 


-81.1 


6.6 


1 



25 



30 



35 



40 



45 



Sequences producing High-scoring Segment Pairs: Score P(N) N 

qb:GENBANK-ID: AX079870 jacc:AX079870.1 Sequence 1 from Pat 24050 0.0 1 

qb:GENBANK-l P: AB0231 77 jacc: AB0231 77. 1 Homo sapiens mRNA f.... 19495 0.0 1 

Qb:GENBANK-ID: AB051 466l acc: AB051 466. 1 Homo sapiens mRNA f 361 1 5.3e-269 6 

gb:GENBANK-ID:AB006087|acc:AB006087.1 Danio rerio mRNA fo 272 0.16 1 

ab:GENBANK-ID: AF111298l acc:AF1 11298.1 HIV-1 isolate eur-0 185 0.998 1 

BLASTP: (1588 letters) 

Database: Non- Redundant Composite Protein 

704,847 sequences; 219,724,008 total letters. 
Searching....l0....20....30....40....50....60....70....80....90....100%done 

Smallest 
Sum 

High Probability 

Sequences producing High-scoring Segment Pairs: Score P(N) N 

ptnrREMTREMBL-ACC: CAC32422 Sequence 1 from Patent WO0105... 8965 0.0 1 

ptnr:SPTREMBL-ACC: Q9UPZ6 KIAA0960 PROTEIN - Homo sapiens ... 7298 0.0 1 

DtnnSPTREMBL-ACC: Q9C0i4 KIAA1 679 PROTEIN - Homo sapiens ... 3983 0.0 1 

ptnr:SPTREMBL-ACC:O60407 HYPOTHETICAL PROTEIN - Homo sapi...3026 3.1e-315 1 

TABLE 2. BLASTN VERSUS GENBANK COMPOSITE 



50 



55 



Sequences producing High-scoring Segment Pairs: 



Score 



gb: GENE ANK-ID: AX079870| acc : AX079870. 1 Sequence 1 from Pat 24050 

gb: GENB ANK-ID: AB023 1 77 |acc : AB023 i 77. 1 Homo sapiens mRNA f.... 19495 

gb:GENB ANK-ID: AB051466|acc:AB05 1466.1 Homo sapiens mRNA f. 3611 

gb:GENB ANK-ID: AB006087|acc:AB006087. 1 Danio rerio mRNA fo 272 

gb:GENBANK-ID:AF111298|acc: AFl 1 1298. 1 HIV-1 isolate eur-0 185 



P(N) N 



0.0 
0.0 
5.3e-269 
0.16 
0.998 



>gb;GENBANK-ID: AX079870 I acc : AX079870 . 1 Sequence 1 from Patent WO0105971 - Homo 
sapiens, 6373 bp. (SEQ ID NO:58) 
60 Length = 6373 

Plus Strand HSPs: 

Score = 24050 (3608.5 bits). Expect =0.0, P = 0.0 
65 Identities = 4810/4810 (100%), Positives = 4810/4810 (100%), Strand = Plus / Plus 
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Query: 


1 




Sbjct : 


218 


5 


Query : 


61 




Sbjct: 


278 


10 


Query: 


121 




Sbjct: 


338 




Query: 


181 


15 


Sbjct: 


398 




Query: 


241 


90 


Sbjct: 


458 




Query: 


301 




Sbjct: 


518 


25 


Query: 


361 




Sbjct: 


578 




Query: 


421 




Sbjct: 


638 




Query: 


481 


35 


Sbjct: 


698 




Query: 


541 




Sbjct: 


758 




Query: 


601 




Sbjct: 


818 


45 


Query: 


661 




'sbjct: 


878 


mJ\J 


Query: 


721 




Sbjct: 


938 




Query : 


781 


55 


Sbjct: 


998 




Query: 


841 


60 


Sbjct: 


1058 




Query: 


901 




Sbjct: 


1118 


65 


Query: 


961 




Sbjct: 


1178 


70 


Query: 


1021 




Sbjct: 


1238 




Query: 


1081 


75 


Sbjct: 


1298 



GTCCATGGGGCCGATGTATGGGAGATGAATGTGGTCCCGGAGGCATCCAAACGAGGGCTG 6 0 
llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



TGTGGTGTGCTCATGTGGAGGGATGGACTACACTGCATACTAACTGTAAGC AGGCCGAGA 120 

IlililMlllilMIIKIIIIilllllllllllllllllllllllltlllltllMII 

TGTGGTGTGCTCATGTGGAGGGATGGACTACACTGCATACTAACTGTAAGCAGGCCGAGA 337 

GACCCAATAACCAGCAGAATTGTTTCAAAGTTTGCGATTGGCACAAAGAGTTGTACGACT 180 
illlllllllllllllilllllltlllllllllllillilillllillllllllllllll 



IIIIIIIIIIIMllililllltlllilllllllllllllllllllllilllllllll] 



illllMIIMIilllilllllllllllliljllljllllllljjlllllllllll 



ACAAAGACATTCCTGCGGAGGATATCATCTGTGAGTACTTTGAGCCCAAGCCTCTCCTGG 3 60 

IIMIiMIIMHIIIIIIIMIIIIIIIIIIIIIIIMIIIIIIIIilllllllMII 

AC AAAGACATTCCTGCGGAGGATATCATCTGTGAGTACTTTGAGCCCAAGCCTCTCCTGG 577 



illllMliilllillilllllllllllllllllillllllllllllllllllllMIII 



llllllllillllllllitlllllllllllllllllllllllllillllllllllllili 



MlllllllllltlliiillllllilllllillMllilll 



480 



GTCCATGCGAGGCCGAGGAGCTCAGGTACAGCCTGCATGTGGGGCCCTGGAGCACCTGCT 

lllillllllMlilillMllllllllllllilillillllilllllllllllllilll 



817 



lllllilllilillllllljilllllllllllllllillllltllililillllllllll 
CAATGCCCCACTCCCGACAAGTAAGACAAGCAAGGAGACGCGGGAAGAATAAAGAACGGG 877 



IIIIIIIIIMIMItllllllMIIIMMIIIIIIIIIIIIMIIMIIIIlMIMI 



tlllllllillllllllllllllllllllllllllllllllMllllllillllllllll 

GAAAC AGAAACAGGCAGAACAGACAAGAGAACAAATATTGGGACATCCAGATTGGATATC 997 
AGACCAGAGAGGTTATGTGCATTAACAAGACGGGGAAAGCTGCTGATTTAAGCTTTTGCC 840 

IIIIIIIIIIIIMIIIIilllilllllllllllllillllllllMIIMllilllill 

AGACCAGAGAGGTTATGTGCATTAACAAGACGGGGAAAGCTGCTGATTTAAGCTTTTGCC 1057 
AGCAAGAGAAGCTTCCAATGACCTTCCAGTCCTGTGTGATCACCAAAGAGTGCCAGGTTT 900 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



CCGAGTGGTCAGAGTGGAGCCCCTGCTCAAAAACATGCCATGACATGGTGTCCCCTGCAG 960 

llllliilMlllllillllMIIIIMIIIItlilllllllllllllllllillllll'l 

CCGAGTGGTCAGAGTGGAGCCCCTGCTCAAAAACATGCCATGACATGGTGTCCCCTGCAG 1177 

GCACTCGTGTAAGGACACGAACCATCAGGCAGTTTCCCATTGGCAGTGAAAAGGAGTGTC 1020 

MMIIIIIIIIIIIIIIilllMllllllllllllllllllllllllillllllMIII 

GCACTCGTGTAAGGACACGAACCATCAGGCAGTTTCCCATTGGCAGTGAAAAGGAGTGTC 1237 



lllllililllllllllllillllllllllilllllllllllllllllllllllllllll 



lltlllillllllllllllllllllMlillililllillllilllllllllll 



1140 



1357 



21 
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Query: 


1141 




Sbjct: 


1358 


5 


Query: 


1201 




Sbjct: 


1418 


10 


Query: 


1261 




Sbjct: 


1478 




Query: 


1321 


15 


Sbjct: 


1538 




Query: 


1381 


20 


Sbjct : 


1598 




Query : 


1441 




Sbjct: 


1658 


25 


Query: 


1501 




Sbjct : 


1718 


30 


Query: 


1561 




Sbjct: 


1778 




Query: 


1621 


35 


Sbjct: 


1838 




Query: 


1681 


40 


Sbjct: 


1898 




Query: 


1741 




Sbjct: 


1958 


45 


Query: 


1801 




Sbj ct : 


2018 


50 


Query : 


1861 




Sbjct: 


2078 




Query: 


1921 


55 


Sbj ct : 


2138 




Query: 


1981 


60 


Sbjct: 


2198 




Query: 


2041 




Sbjct: 


2258 


65 


Query: 


2101 




Sbjct: 


2318 


70 


Query: 


2161 




Sbjct: 


2378 




Query: 


2221 


75 


Sbjct: 


2438 



AGGAC AAGAGGCGCGGCAACCAGACGGCCCTCTGTGGAGGGGGC ATCCAGACCCGAGAGG 1200 

IMIIIIIIIIilllllMlllllllllllllMllililllMMMIMtllllllll 

AGGACAAGAGGCGCGGCAACCAGACGGCCCTCTGTGGAGGGGGCATCCAGACCCGAGAGG 1417 
TGTACTGCGTGC AGGCCAACGAAAACCTCCTCTCACAATTAAGTACCCACAAGAACAAAG 1260 

lllllllllilllMIMIIIIIIIIIIIIilllMllllllililllllllllllllll 



iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 



1320 



1537 



1380 



lllljlllillltlMllllfilllllllllMllllljillilllllilllllllllll 

TGTGCCAC ATTCCTTGTCCAACTGAATGTGAAGTTTCACCTTGGTCAGCTTGGGGACCTT 1597 

GTACTTATGAAAACTGTAATGATCAGCAAGGGAAAAAAGGCTTC AAACTGAGGAAGCGGC 1440 

llllllllllllllilllllllllllllllllllilllllllllllllllllllllllll 
GTACTTATGAAAACTGTAATGATCAGCAAGGGAAAAAAGGCTTC AAACTGAGGAAGCGGC 1657 



1500 



illMIIIII 



lilllllllllllllllMllillillllMilllllllllMII 



' 1560 



llllilllllllllillillillllillliilllilllllllllllllllllllllll 



GCGAGCCAGATAACGGAAAGGAGTGTGGTCCAGGCACGCAAGTTC AAGAGGTTGTGTGCA 1620 

IIIMIIIIIIIIjillMIIIMMIIIIIIIIillilllllllllllMlllilllll 
GCGAGCCAGATAACGGAAAGGAGTGTGGTCCAGGCACGC AAGTTCAAGAGGTTGTGTGCA 1837 



lllllllilMllllllllitllMlllitlllllilllliilllllllllllllilltl 



CTGTGGCCTGTGATGCCCCATGCCCGAAAGACTGTGTGCTCAGCACATGGTCTACGTGGT 1740 

IIIIIIIIIIMIilillilllilllillllllllllllllllllllillilllllllll 
CTGTGGCCTGTGATGCCCCATGCCCGAAAGACTGTGTGCTCAGCACATGGTCTACGTGGT 1957 



liMlliillllillllliliilllllllllliillllllllllllllllllllilllll 



CCATTCTGGCCTATGCGGGTGAAGAAGGTGGAATTCGCTGTCCAAATAGCAGTGCTTTGC 

jMllllMIIIIIIMIIIItlltjIIMItllllllllllMllllillljlllltll 



AAGAAGTACGAAGCTGTAATGAGCATCCTTGCACAGTGTACCACTGGCAAACTGGTCCCT 1920 

AAGAAGTACGAAGCTGTAATGAGCATCCTTGCACAGTGTACCACTGGCAAACTGGTCCCT 2137 

GGGGCCAGTGC ATTGAGGAC ACCTCAGTATCGTCCTTCAACACAACTACGACTTGGAATG 1980 
lllllilllljltillllllillllllllllllilllllillllllilillillllllM 



2040 



llllllllllllliillllilllllilillllillllMilllllllllllllllillil 

GGGAGGCCTCCTGCTCTGTCGGCATGCAGACAAGAAAAGTCATCTGTGTGCGAGTCAATG 22 57 

TGGGCCAAGTGGGACCCAAAAAATGTCCTGAAAGCCTTCGACCTGAAACTGTAAGGCCTT 2100 
illllllilllllllMlilllllllHillliillllllMlllllllilillllMII 



GTCTGCTTCCTTGTAAGAAGGACTGTATTGTGACCCCATATAGTGACTGGACATCATGCC 2160 

IIMIIIIIIIIIIIIIIIItlllllllllllilDIIIIIIIIIIIIIIIIIIIIIIII 

GTCTGCTTCCTTGTAAGAAGGACTGTATTGTGACCCCATATAGTGACTGGACATCATGCC 2377 



CCTCTTCGTGTAAAGAAGGGGACTCCAGTATCAGGAAGCAGTCTAGGCATCGGGTCATCA 

illilillillllllltllllllliliilllllilllllllllillilllillllllill 



IIIIIMIIillllllllllllilMlllililMlllllllllllllllllllllilil 



2220 



2280 



22 
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Query: 


2281 




Sb3Ct : 


2498 


5 


Query: 


2341 




Sbjct: 


2558 


10 


Query: 


2401 




Sbjct: 


2618 




Query: 


2461 


15 


Sbjct: 


2678 




Query: 


2521 




Sbjct: 


2738 




Query: 


2581 




Sbjct: 


2798 


25 


Query: 


2641 




Sbjct: 


2858 




Query: 


2701 




Sbjct: 


2918 




Query: 


2761 


35 


Sbjct: 


2978 




Query: 


2821 




Sbjct; 


3038 




Query: 


2881 




Sbjct: 


3098 


45 


Query: 


2941 




Sbjct: 


3158 


50 


Query: 


3001 




Sbjct: 


3218 




Query: 


3061 


55 


Sbjct: 


3278 




Query: 


3121 


60 


Sbjct: 


3338 




Query: 


3181 




Sbjct: 


3398 


65 


Query: 


3241 




Sbjct: 


3458 


70 


Query: 


3301 




Sbjct: 


3518 




Query: 


3361 


75 


Sbjct: 


3578 



GTGAGGCACCTCAAGCGTGCCAAAGCTACAGGTGGAAGACTCACAAATGGCGCAGATGCC 2340 

llltllllllllllMIIIIIII jlllllllllljllllllillMlllMIIIIKIil 

GTGAGGCACCTCAAGCGTGCCAAAGCTACAGGTGGAAGACTCACAAATGGCGCAGATGCC 2 5 57 

AATTAGTCCCTTGGAGCGTGCAACAAGACAGCCCTGGAGCACAGGAAGGCTGTGGGCCTG 2400 

IMIIIIIIIIIIIIIIIIIIIIIIIIIIIIINIIIIIIIillllllilltlllllill 

AATTAGTCCCTTGGAGCGTGCAACAAGACAGCCCTGGAGCACAGGAAGGCTGTGGGCCTG 2617 
GGCGAC AGGCAAGAGCC ATTACTTGTCGCAAGCAAGATGGAGGACAGGCTGGAATCCATG 2460 

IlililllllilllillltilillillllMillilljlllliilllllllilllllill 



illlllllllllllllllllllllllMIMIIiillllillllllllllllllllllll 



AGGATGACTGTC AATTGACCAGCTGGTCCAAGTTTTCTTCATGCAATGGAGACTGTGGTG 2580 

lllllllllillMilllMIIIIIIIIIIIIMMNIIIIIiilMlljllMIIIII 

AGGATGACTGTCAATTGACCAGCTGGTCC AAGTTTTCTTCATGCAATGGAGACTGTGGTG 2797 
CAGTTAGGACCAGAAAGCGCACTCTTGTTGGAAAAAGTAAAAAGAAGGAAAAATGTAAAA 2 640 

I I I i I I I I I 1 1 I I I I I M I I I 1 1 I 1 1 I [ I I I I M I I I M I I I 1 1 I I I I I I I 1 1 I 1 1 I i I I 

CAGTTAGGACC AGAAAGCGC ACTCTTGTTGGAAAAAGTAAAAAGAAGGAAAAATGTAAAA 2857 



lilllllllillllillllllilMllllllilllllltlllllllllllllllllMII 

ATTCCCATTTGTATCCCCTGATTGAGACTCAGTATTGTCCTTGTGACAAATATAATGCAC 2917 

AACCTGTGGGGAACTGGTCAGACTGTATTTTACCAGAGGGAAAAGTGGAAGTGTTGCTGG 2760 

lllllllllllNllllllilllllllllllllillltiillliilllllllililllil 



llllllllilllllllMlllllillllliiillllllllllllMllllllllllllll 



CATGCTACGATCAAAATGGCAGGCTTGTGGAAACATCTAGATGTAAC AGCCATGGTTACA 2880 

IIIIIIIMIIIIIIMiililllillllllllllllllllllllllllilllllllill 



TTGAGGAGGCCTGCATC ATCCCCTGCCCCTCAGACTGCAAGCTC AGTGAGTGGTCCAACT 2940 

llillllllllllllilllllllilllllllillilllNlillillltlllilllllll 

TTGAGGAGGCCTGCATCATCCCCTGCCCCTCAGACTGCAAGCTCAGTGAGTGGTCCAACT 3157 

GGTCGCGCTGCAGCAAGTCCTGTGGGAGTGGTGTGAAGGTTCGTTCTAAATGGCTGCGTG 3 000 

lllllllllj|||l|]|l|j||||IIIIMIIIIIIIItlltllilllMllilllllll 



AAAAACCATATAATGGAGGAAGGCCTTGCCCCAAACTGGACCATGTC AACCAGGCACAGG 3060 

liiiiiiniiiiiiiiiiiiiiniiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 

AAAAACCATATAATGGAGGAAGGCCTTGCCCC AAACTGGACCATGTCAACCAGGCAC AGG 3277 

TGTATGAGGTTGTCCCATGCC ACAGTGACTGCAACCAGTACCTATGGGTCACAGAGCCCT 3120 
llllllllllllillllllilllllllillllllllllllllililllllllllllllll 



GGAGCATCTGC AAGGTGACCTTTGTGAATATGCGGGAGAACTGTGGAGAGGGCGTGCAAA 3180 

llllllllliiiillllllllllllllllillilllillMlillliillllMMIIII 
GGAGCATCTGCAAGGTGACCTTTGTGAATATGCGGGAGAACTGTGGAGAGGGCGTGCAAA 3397 

CCCGAAAAGTGAGATGCATGC AGAATACAGCAGATGGCCCTTCTGAAC ATGTAGAGGATT 3240 
llllltllllililllilMMIillliillilMlllljllllMIIIIIIIIIIIII] 



ACCTCTGTGACCCAGAAGAGATGCCCCTGGGCTCTAGAGTGTGCAAATTACCATGCCCTG 3300 

IIIIIIMIIIIIIIIIIIIIIIIIIIIIillllllllllllllllllllllllllMII 

ACCTCTGTGACCCAGAAGAGATGCCCCTGGGCTCTAGAGTGTGCAAATTACCATGCCCTG 3 517 

AGGACTGTGTGATATCTGAATGGGGTCCATGGACCC AATGTGTTTTGCCTTGCAATCAAA 3360 
lllllltltllltilllllllltlllllllllllllllllllilllliillillllllll - 



IIIIIIMIilllilllllllillllllNIIIIIIMllllllllliMlllllillll 



23 





Query: 


3421 




Sbjct: 


3638 


5 


Query: 


3481 




Sbjct : 


3698 


in 


Query: 


3541 




Sb j ct : 


3758 




Query: 


3601 


15 


Sbjct: 


3818 




Query: 


3661 




Sbjct : 


3878 




' Query : 


3721 




Sbjct : 


3938 


25 


Query: 


3781 




Sbj ct : 


3998 




Query: 


3841 




Sbjct: 


4058 




Query: 


3901 


35 


Sbjct: 


4118 




Query: 


3961 




Sbjct: 


4178 




Query: 


4021 




Sbjct: 


.4238 


45 


Query: 


4081 




Sbjct: 


4298 




Query: 


4141 




Sbjct: 


4358 




Query: 


4201 


55 


Sbjct: 


4418 




Query: 


4261 


60 

\jyj 


Sbjct: 


4478 




Query: 


4321 




Sbjct: 


4538 


65 


Query: 


4381 




Sbjct : 


4598 


70 


Query: 


4441 




Sbjct: 


4658 




Query: 


4501 


75 


Sbj ct : 


4718 



3480 



illlllllllllillMIIIIIII 



lllllllllllllllllilMMIMIIItll 



ATAATGTAACAGACTGGAGTACATGTCAGCTGAGTGAGAAGGCAGTTTGTGGAAATGGAA 3540 

ililllllllljlllllllllllllllllllMlllllltllllllllllllllllMII 

AT AATGTAAC AGACTGGAGT AC ATGTC AGCTG AGTGAGAAGGC AGTTTGTGGAAATGGAA 37 57 
TAAAAACAAGGATGTTGGATTGTGTTCGAAGTGATGGCAAGTCAGTTGACCTGAAATATT 3600 

llllllllllllllllllllllllllllllllllilllllllllllllllllllllllll 

TAAAAACAAGGATGTTGGATTGTGTTCGAAGTGATGGCAAGTCAGTTGACCTGAAAT ATT 3817 

GTGAAGCGCTTGGCTTGGAGAAGAACTGGCAGATGAAC ACGTCCTGC ATGGTGGAATGCC 3660 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GO^SAAGOSCrTGGCTTGGAGAAGAACTGGCAGATGAAC ACGTCCTGC ATGGTGGAATGCC 3877 

CTGTGAACTGTCAGCTTTCTGATTGGTCTCCTTGGTCAGAATGTTCTCAAACATGTGGCC 3720 

lllllllllllllllllllllllllllllllllllllllllllllllllillllllllll 
CTGTGAACTGTCAGCTTTCTGATTGGTCTCCTTGGTCAGAATGTTCTCAAACATGTGGCC 3 937 



3780 



j ! j !! I I I I I M I 1 1 I I 1 1 I I I I I I I I I 1 1 I I I I n I I M I I I I I 1 1 I 1 1 I I I I I M ! j I 

TC AC AGGAAAAATGATCCGAAGACGAAC AGTGACCC AGCCCTTTC AAGGTGATGGAAGAC 3997 

CATGCCCTTCCCTGATGGACCAGTCCAAACCCTGCCCAGTGAAGCCTTGTTATCGGTGGC 3840 

lllllllllliilllllillllllllllllllilllllilllillllilllllllllll) 
CATGCCCTTCCCTGATGGACCAGTCCAAACCCTGCCCAGTGAAGCCTTGTTATCGGTGGC 4 0 57 

AATATGGCC AGTGGTCTCCATGCCAAGTGCAGGAGGCCCAGTGTGGAGAAGGGACC AGAA 3900 
illllllllllllilllllllllllllllllllllllllllllilllllllillllllll 



lllllllilllllllllllllllilllillllllllMMIIililll 



llllllllt 



ATGAGGAATTCTGTGCTGACATTGAACTCATTATAGATGGT AATAAAAATATGGTTCTGG 4020 

iiiiiiiiiiiiiiiiiiiiiiiiijiiiiiiiiiiiiijiiiniiiMiiiiiiiiii 

ATGAGGAATTCTGTGCTGACATTGAACTCATTATAGATGGTAATAAAAATATGGTTCTGG 4237 
AGGAATCCTGCAGCCAGCCTTGCCCAGGTGACTGTTATTTGAAGGACTGGTCTTCCTGGA 4080 

IIIIMIIIIilllllllllllllllilllllllllllllllllililllllllllllli 

AGGAATCCTGCAGCCAGCCTTGCCCAGGTGACTGTTATTTGAAGGACTGGTCTTCCTGGA 42 97 

GCCTGTGTCAGCTGACCTGTGTGAATGGTGAGGATCTAGGCTTTGGTGGAATAC AGGTCA 4140 

IlillljllillllllllillllllllltllllllllllllMlllliillllillllli 

GCCTGTGTCAGCTGACCTGTGTGAATGGTGAGGATCTAGGCTTTGGTGGAATACAGGTCA 43 57 



iiniiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiMtiiiiiiiiiiii 



lilllllll'llllllllllllllllliilillllllllllllillllMllllllllill 



llllllllllllllllltlillltiillllllllllllllMilMlllliMlilllll 



4200 



IlijlllllllillillllllillllllllllllllllllililllllMIIII 



4380 



4597 



lltlllllllllllllllMljllllllllllllllilllllllllllllllilllMII 



lliillliiilllliitlllilllMllllililllllliiiiitiiiiiiiiiiiiiil 



llllltllllilllllliillllilllliliillllllillililllllllllillllil 



24 
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Query: 


4561 


Sbj ct : 


4778 


Query : 


4621 


Sbjct: 


4838 


Query : 


4681 


Sbjct: 


4898 


Query: 


4741 


Sbjct: 


4958 


Query : 


4801 


Sbjct: 


5018 



IMIIjlllillllllMltMlllltMllllinilllllillMIIMIIIIII 



IlillitllilllllillllllllllllllllllllltlllllllllillMIIIIIIII 



4620 



4740 



IlliillllllilllllillllllilllllllllllillMlilllllllllllillllt 
CCATGATTTATCTAGCTTGCAAAAAGCCAAAGAAACCCCAAAGAAGGCAAAACAACCGAC 4957 

TGAAACCTTTAACCTTAGCCTATGATGGAGATGCCGACATGTAACATATAACTTTTCCTG 4800 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
TGAAACCTTrAACCTTAGCCTATGATGQAGATGCCGACATGTAACATATAACTTTTCCTG 50 17 



25 



Table 3. BLASTN VERSUS GENBANK COMPOSITE 

>gb:GENBANK-ID:AB023177| acc:AB023177 . 1 Homo sapiens mRNA for KIAA0960 protein, 
partial cds - Homo sapiens, 5032 bp. (SEQ ID NO:59) 
Length = 5032 



Plus Strand HSPs: 



Score = 19495 (2925.0 bits). Expect = 0.0, P = 0.0 
30 Identities = 3899/3899 (100%), Positives = 3899/3899 (100%), Strand = Plus / Plus 



35 



40 



45 



50 



55 



60 



65 



70 



Query : 


912 


Sbjct: 


1 


Query : 


972 


Sbjct: 


61 


Query: 


1032 


Sbjct: 


121 


Query : 


1092 


Sbjct: 


181 


Query: 


1152 


Sbjct: 


241 


Query: 


1212 


Sbjct: 


301 


Query: 


1272 


Sbjct : 


361 


Query : 


1332 


Sbjct: 


421 


Query: 


1392 


Sbjct: 


481 


Query: 


1452 


Sbjct: 


541 



GAGTGGAGCCCCTGCTCAAAAACATGCCATGACATGGTGTCCCCTGCAGGCACTCGTGTA 

IIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIII 

GAGTGGAGCCCCTGCTCAAAAACATGCCATGACATGGTGTCCCCTGCAGGCACTCGTGTA 



971 



60 



AGGACACGAACCATCAGGCAGTTTCCCATTGGCAGTGAAAAGGAGTGTCCAGAATTTGAA 1031 

IIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIillll 
AGGACACGAACCATCAGGCAGTTTCCCATTGGCAGTGAAAAGGAGTGTCCAGAATTTGAA 120 

GAAAAAGAACCCTGTTTGTCTCAAGGAGATGGAGTTGTCCCCTGTGCCACGTATGGCTGG 1091 

lllllllllllllllllllllllilllllllllllllliillllllllllllllllllll 
GAAAAAGAACCCTGTTTGTCTCAAGGAGATGGAGTTGTCCCCTGTGCCACGTATGGCTGG 180 

AGAACTAC AGAGTGGACTGAGTGCCGTGTGGACCCTTTGCTCAGTCAGCAGGACAAGAGG 1151 

IIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

AGAACTACAGAGTGGACTGAGTGCCGTGTGGACCCTTTGCTCAGTCAGCAGGACAAGAGG 240 

CGCGGCAACCAGACGGCCCTCTGTGGAGGGGGCATCCAGACCCGAGAGGTGTACTGCGTG 1211 

lllllllllllllillllllllllllllllllllllllllllllllllllllllllllll 
CGCGGCAACCAGACGGCCCTCTGTGGAGGGGGCATCCAGACCCGAGAGGTGTACTGCGTG 300 

CAGGCCAACGAAAACCTCCTCTCACAATTAAGTACCCACAAGAACAAAGAAGCCTCAAAG 1271 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CAGGCCAACGAAAACCTCCTCTCACAATTAAGTACCCACAAGAACAAAGAAGCCTCAAAG 360 

CCAATGGACTTAAAATTATGCACTGGACCTATCCCTAATACTACACAGCTGTGCCACATT 1331 

IIIIIIIIIIIIIIIIIIIIIMIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CCAATGGACTTAAAATTATGCACTGGACCTATCCCTAATACTACACAGCTGTGCCACATT 420 
CCTTGTCCAACTGAATGTGAAGTTTCACCTTGGTCAGCTTGGGGACCTTGTACTTATGAA 13 91 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIMIII 

CCTTGTCCAACTGAATGTGAAGTTTCACCTTGGTCAGCTTGGGGACCTTGTACTTATGAA 480 
AACTGTAATGATC AGC AAGGGAAAAAAGGCTTCAAACTGAGGAAGCGGCGCATTACCAAT 1451 

IIIIIIIIIMIIIIIIIIIIIIIIMMIIIIIIIIIIIIIIIIIIIIIIMIIIIIII 

AACTGTAATGATC AGC AAGGGAAAAAAGGCTTCAAACTGAGGAAGCGGCGCATTACCAAT 540 
GAGCCCACTGGAGGCTCTGGGGTAACCGGAAACTGCCCTCACTTACTGGAAGCCATTCCC 1511 

IIIIIIIIIIIIIIIMIIIMIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIII 

GAGCCCACTGGAGGCTCTGGGGTAACCGGAAACTGCCCTCACTTACTGGAAGCCATTCCC 600 



25 
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20 



25 



30 



35 



40 



45 



50 



55 



60 
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Query: 


1512 


Sbjct : 


601 


Query: 


1572 


Sbjct : 


661 


Query : 


1632 


Sbjct: 


721 


Query : 


1692 


Sbjct: 


781 


Query: 


1752 


Sbjct: 


841 


Query: 


1812 


Sbjct: 


901 


Query: 


1872 


Sbjct: 


961 


Query: 


1932 


Sbjct: 


1021 


Query: 


1992 


Sbjct: 


1081 


Query : 


2052 


Sbjct: 


1141 


Query: 


2112 


Sbjct: 


1201 


Query: 


2172 


Sbjct: 


1261 


Query : 


2232 


Sbjct: 


1321 


Query: 


2292 


Sbjct: 


1381 


Query: 


2352 


Sbjct: 


1441 


Query: 


2412 


Sbjct: 


1501 


Query: 


2472 


Sbjct: 


1561 



TGTGAAG AGCCTGCCTGTTATGACTGGAAAGCGGTGAGACTGGGAGACTGCGAGCCAGAT 1571 

IIIIIIIIIIIIIIIIIMIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIII 

TGTGAAGAGCCTGCCTGTTATGACTGGAAAGCGGTGAGACTGGGAGACTGCGAGCCAGAT 660 

AACGGAAAGG AGTGTGGTCCAGGCACGCAAGTTC AAG AGGTTGTGTGC ATC AACAGTGAT 1631 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
AACGGAAAGGAGTGTGGTCCAGGCACGCAAGTTC AAGAGGTTGTGTGCATCAACAGOXS AT 720 

GGAGAAGAAGTTGAC AGACAGCTGTGCAGAGATGCCATCTTCCCC ATCCCTGTGGCCTGT 1691 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GGAGAAGAAGTTGACAGACAGCTGTGCAGAGATGCCATCTTCCCCATCCCTGTGGCCTGT 780 

GATGCCCCATGCCCGAAAGACTGTGTGCTCAGCACATGGTCTACGTGGTCCTCCTGCTCA 1751 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GATGCCCCATGCCCGAAAGACTGTGTGCTCAGC ACATGGTCTACGTGGTCCTCCTGCTCA 840 

CAC ACCTGCTCAGGGAAAACGAC AGAAGGGAAACAGATACGAGCACGATCCATTCTGGCC 1811 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CACACCTGCTCAGGGAAAACGACAGAAGGGAAACAGATACGAGCACGATCCATTCTGGCC 900 

TATGCGGGTGAAGAAGGTGGAATTCGCTGTCCAAATAGCAGTGCTTTGCAAGAAGTACGA 1871 

1 1 Mill III lllll II I III I Ml 1 1 till III II I III II I II II III I II I MM II 

TATGCGGGTGAAGAAGGTGGAATTCGCTGTCCAAATAGCAGTGCTTTGCAAGAAGTACGA 960 

AGCTGTAATGAGCATCCTTGCACAGTGTACCACTGGCAAACTGGTCCCTGGGGCCAGTGC 1931 

MMMMMMMMMMMMMMMMMMMMIMMMMMM IIIMM 

AGCTGTAATGAGCATCCTTGCAC AGTGTACC ACTGGCAAACTGGTCCCTGGGGCCAGTGC 1020 

ATTGAGGACACCTCAGTATCGTCCTTCAACACAACTACGACTTGGAATGGGGAGGCCTCC 1991 

IIIIMIIIIMIIIIIIIIMI IIIIMIIIIIIIIIIIIIIMIIIIIIIIIIIIIII 

ATTGAGGACACCTCAGTATCGTCCTTCAACAC AACTACGACTTGGAATGGGGAGGCCTCC 1080 

TGCTCTGTCGGCATGCAGACAAGAAAAGTCATCTGTGTGCGAGTCAATGTGGGCCAAGTG 2051 

IIIIMIIIIIIIIMIMMII IIIIIMIMIIIIIIIIIIIMIIIIIIIIIIIIM 

TGCTCTGTCGGCATGCAGACAAGAAAAGTCATCTGTGTGCGAGTCAATGTGGGCCAAGTG 1140 

GGACCCAAAAAATGTCCTGAAAGCCTTCGACCTGAAACTGTAAGGCCTTGTCTGCTTCCT 2111 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GGACCCAAAAAATGTCCTGAAAGCCTTCGACCTGAAACTGTAAGGCCTTGTCTGCTTCCT 1200 

TGTAAGAAGGACTGTATTGTGACCCCATATAGTGACTGGACATCATGCCCCTCTTCGTGT 2171 

II IIIIMIIIIIIIIIIIMIIIIIIIIIIMIIIIIIIIIIIIMIIIIIIIIIIIM 

TGTAAGAAGGACTGTATTGTGACCCC ATATAGTGACTGGACATCATGCCCCTCTTCGTGT 1260 



IMIIIilllMllllllillMIIIIIIIIIIIIIIMIIIMillllMIIIIMI 



GCCAACGGGGGCCGAGACTGCACAGATCCCCTCTATGAAGAGAAGGCCTGTGAGGCACCT 2291 

IMIIIIIIIIIMIIIIIMIIIIIIIIIIMMMMIIIIIMMMIIIIIIIMi 

GCCAACGGGGGCCGAGACTGCACAGATCCCCTCTATGAAGAGAAGGCCTGTGAGGCACCT 1380 



IIIIIIMIMIIIIIIIIIMIMIIIIIMIMMIIIIIIIIMIIIIIIIIMII 



TGGAGCGTGC AACAAGACAGCCCTGGAGCAC AGGAAGGCTGTGGGCCTGGGCGACAGGCA 2411 

II III! MM lllll II llllll II MM! MM MM II IIIMM MM II II III 1 1 

TGGAGCGTGCAACAAGACAGCCCTGGAGCACAGGAAGGCTGTGGGCCTGGGCGACAGGCA 1500 



llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

1501 AGAGCCATTACTTGTCGCAAGCAAGATGGAGGACAGGCTGGAATCCATGAGTGCCTACAG 1560 

TATGCAGGCCCTGTGCCAGCCCTTACCCAGGCCTGCCAGATCCCCTGCCAGGATGACTGT 2531 

lllllllllllllllllillllllllllllllllllllllllllllllllllllllllll 
TATGCAGGCCCTGTGCCAGCCCTTACCCAGGCCTGCCAGATCCCCTGCCAGGATGACTGT 1620 

26 
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Query: 


2532 


Sbjct : 


1621 


Query : 


2592 


Sbjct : 


1681 


Query: 


2652 


Sbjct : 


1741 


Query: 


2712 


Sbjct : 


1801 


Query : 


2772 


Sbjct: 


1861 


Query : 


2832 


Sbjct: 


1921 


Query: 


2892 


Sbjct: 


1981 


Query: 


2952 


Sbjct: 


2041 


Query: 


3012 


Sbjct: 


2101 


Query: 


3072 


Sbjct: 


2161 


Query: 


3132 


Sbjct: 


2221 


Query: 


3192 


Sbjct : 


2281 


Query: 


3252 


Sbjct: 


2341 


Query: 


3312 


Sbjct: 


2401 


Query : 


3372 


Sbjct: 


2461 


Query : 


3432 


Sbjct: 


2521 


Query: 


3492 


Sbjct : 


2581 



CAATTGACC AGCTGGTCC AAGTTTTCTTCATGC AATGGAGACTGTGGTGC AGTTAGGACC 2 591 

IIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CAATTGACCAGCTGGTCCAAGTTTTCTTCATGCAATGGAGACTGTGGTGCAGTTAGGACC 1680 
AGAAAGCGCACTCTTGTTGGAAAAAGTAAAAAGAAGGAAAAATGTAAAAATTCCCATTTG 2651 

IIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIMIIII 

AGAAAGCGCACTCTTGTTGGAAAAAGTAAAAAGAAGGAAAAATGTAAAAATTCCCATTTG 1740 
TATCCCCTGATTG AG ACTC AGTATTGTCCTTGTGAC AAATATAATGCACAACCTGTGGGG 2711 

IIIIMIIIIIIIIIIIIIIIIIillllilllllllllllllllllllllllllllllll 

TATCCCCTGATTGAGACTCAGTATTGTCCTTGTGACAAATATAATGCACAACCTGTGGGG 1800 
AACTGGTCAGACTGTATTTTACCAGAGGGAAAAGTGGAAGTGTTGCTGGGAATGAAAGTA 2771 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMI 

AACTGGTCAGACTGTATTTTACCAGAGGGAA7AGTGGAAGTGTTGCTGGGAATGAAAGTA 1860 

CAAGGAGACATCAAGGAATGCGGACAAGGATATCGTTACCAAGCAATGGC ATGCTACGAT 2831 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CAAGGAGACATCAAGGAATGCGGACAAGGATATCGTTACCAAGCAATGGCATGCTACGAT 1920 

CAAAATGGCAGGCTTGTGGAAACATCTAGATGTAACAGCCATGGTTACATTGAGGAGGCC 2891 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CAAAATGGCAGGCTTGTGGAAACATCTAGATGTAACAGCCATGGTTACATTGAGGAGGCC 1980 

TGCATCATCCCCTGCCCCTCAGACTGCAAGCTCAGTGAGTGGTCCAACTGGTCGCGCTGC 2951 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

TGCATCATCCCCTGCCCCTCAGACTGC AAGCTCAGTGAGTGGTCCAACTGGTCGCGCTGC 2040 

AGCAAGTCCTGTGGGAGTGGTGTGAAGGTTCGTTCTAAATGGCTGCGTGAAAAACCATAT 3011 

IlilllllllllllllllllllllllllllllllMIIIIIIIIIIIIIIIIIIIIIIII 

AGCAAGTCCTGTGGGAGTGGTGTGAAGGTTCGTTCTAAATGGCTGCGTGAAAAACCATAT 2100 

AATGGAGGAAGGCCTTGCCCCAAACTGGACCATGTCAACCAGGCACAGGTGTATGAGGTT 3071 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
AATGGAGGAAGGCCTTGCCCCAAACTGGACC ATGTCAACCAGGCACAGGTGTATGAGGTT 2160 

GTCCCATGCCACAGTGACTGCAACCAGTACCTATGGGTCACAGAGCCCTGGAGCATCTGC 3131 

IIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

GTCCCATGCCACAGTGACTGC AACCAGTACCTATGGGTCACAGAGCCCTGGAGCATCTGC 2220 

AAGGTGACCTTTGTGAATATGCGGGAGAACTGTGGAGAGGGCGTGCAAACCCGAAAAGTG 3191 

IMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIII 

AAGGTGACCTTTGTGAATATGCGGGAGAACTGTGGAGAGGGCGTGCAAACCCGAAAAGTG 2280 

AGATGCATGCAGAATACAGCAGATGGCCCTTCTGAACATGTAGAGGATTACCTCTGTGAC 3251 

llllllllllllllllllllllllllllllllllllllllilillllllMIIIIIIIII 

AGATGCATGCAGAAT ACAGC AGATGGCCCTTCTGAACATGTAGAGGATTACCTCTGTGAC 2340 

CCAGAAGAGATGCCCCTGGGCTCTAGAGTGTGCAAATTACCATGCCCTGAGGACTGTGTG 3311 

IIIIIIIIIIIIIIIIIIIIIMIIIIIillllllllllllMIIIIIIIIIIIIIIIII 

CCAGAAGAGATGCCCCTGGGCTCTAGAGTGTGCAAATTACCATGCCCTGAGGACTGTGTG, 2400 

ATATCTGAATGGGGTCCATGGACCCAATGTGTTTTGCCTTGCAATCAAAGCAGTTTCCGG 3371 

IIIIIIIIIIMIIillllllllllMIIIIIIIIIIIIMIIIIIIillilllllllll 

ATATCTGAATGGGGTCCATGGACCCAATGTGTTTTGCCTTGCAATCAAAGCAGTTTCCGG 2460 
CAAAGGTCAGCTGATCCCATCAGACAACCAGCTGATGAAGGAAGATCTTGCCCTAATGCT 3431 

iiiMiiiiiiiiiiiiiiiiiiiiiiniiiiiiiiiiiiiiiiiiiiiMiiiiiiii 

CAAAGGTCAGCTGATCCC ATC AGACAACCAGCTGATGAAGGAAGATCTTGCCCTAATGCT 2520 
GTTGAGAAAGAACCCTGTAACCTGAACAAAAACTGCTACCACTATGATTATAATGTAACA 3491 

lllllllllllllllllllllllllllllllllllllllliMIIIIIIIIIIIIIIIII 

GTTGAGAAAG AACCCTGTAACCTGAACAAAAACTGCTACCACTATGATTATAATGTAACA 2580 

GACTGGAGTACATGTCAGCTG AGTGAGAAGGC AGTTTGTGGAAATGGAATAAAAACAAGG 3551 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GACTGGAGTACATGTCAGCTGAGTGAGAAGGCAGTTTGTGGAAATGGAATAAAAACAAGG 2640 

27 
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ATGTTGGATTGTGTTCGAAGTGATGGCAAGTCAGTTGACCTGAAATATTGTGAAGCGCTT 3611 

lllllllllllllllllllllllllillllllllllllllllllllllllllllllllll 

ATGTTGGATTGTGTTCGAAGTG ATGGCAAGTC AGTTGACCTGAAATATTGTGAAGCGCTT 27 00 

GGCTTGGAGAAGAACTGGCAGATGAACACGTCCTGCATGGTGGAATGCCCTGTGAACTGT 3671 

IIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIII 

GGCTTGGAGAAGAACTGGCAGATGAACACGTCCTGC ATGGTGGAATGCCCTGTGAACTGT 27 60 
CAGCTTTCTGATTGGTCTCCTTGGTC AGAATGTTCTC AAACATGTGGCCTCACAGGAAAA 3731 

MIIMIIIIIIIMIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CAGCTTTCTGATTGGTCTCCTTGGTCAGAATGTTCTCAAACATGTGGCCTCACAGGAAAA 2820 
ATGATCCGAAGACGAACAGTGACCCAGCCCTTTCAAGGTGATGGAAGACCATGCCCTTCC 3791 

lllllllllllllllllllllllllllllllllllllllllllllllllllllllilMI 

ATGATCCGAAGACGAAC AGTGACCCAGCCCTTTCAAGGTGATGGAAGACC ATGCCCTTCC 2880 

CTGATGGACCAGTCCAAACCCTGCCCAGTGAAGCCTTGTTATCGGTGGCAATATGGCCAG- 3851 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CTGATGGACC AGTCC AAACCCTGCCC AGTGAAGCCTTGTTATCGGTGGCAATATGGCCAG 2940 

TGGTCTCCATGCCAAGTGCAGGAGGCCCAGTGTGGAGAAGGGACCAGAACAAGGAACATT 3911 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

TGGTCTCCATGCCAAGTGCAGGAGGCCCAGTGTGGAGAAGGGACCAGAACAAGGAACATT 3000 

TCTTGTGTAGTAAGTGATGGGTCAGCTGATGATTTCAGCAAAGTGGTGGATGAGGAATTC 3971 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
TCTTGTGTAGTAAGTGATGGGTCAGCTGATGATTTCAGCAAAGTGGTGGATGAGGAATTC 3060 

TGTGCTGACATTGAACTC ATTATAGATGGTAAT7y\AAATATGGTTCTGGAGGAATCCTGC 4031 

lllillllllllllllllllllllllllllllllllllllllllllllllllllllllll 
TGTGCTGACATTGAACTCATTATAGATGGTAATAAAAATATGGTTCTGGAGGAATCCTGC 3120 

AGCCAGCCTTGCCCAGGTGACTGTTATTTGAAGGACTGGTCTTCCTGGAGCCTGTGTCAG 4091 

IIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIilllllllllllllllllllllllll 

AGCCAGCCTTGCCCAGGTGACTGTTATTTGAAGGACTGGTCTTCCTGGAGCCTGTGTCAG 3180 

CTGACCTGTGTGAATGGTGAGGATCTAGGCTTTGGTGGAATACAGGTCAGATCCAGACCG 4151 

llllllllllllllllllllllllllllllllllllllllllllllilllllllllllll 
CTGACCTGTGTGAATGGTGAGGATCTAGGCTTTGGTGGAATACAGGTCAGATCCAGACCG 3240 

GTGATTATACAAGAACTAGAGAATCAGCATCTGTGCCCAGAGCAGATGTTAGAAACAAAA 42 il 

IIIIIIIIIIIIIIIMIIIIillllllllllllllllllllllllllllMllllllli 

GTGATTATACAAGAACTAGAGAATCAGCATCTGTGCCCAGAGCAGATGTTAGAAACAAAA 3300 



IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIII 



TCCCGAACAGTGTGGTGTCAAAGGTCAGATGGTATAAATGTAACAGGGGGCTGCTTGGTG 4331 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIII 

TCCCGAAC AGTGTGGTGTCAAAGGTC AGATGGTATAAATGTAACAGGGGGCTGCTTGGTG 3420 

ATGAGCCAGCCTGATGCCGACAGGTCTTGTAACCCACCGTGTAGTCAACCCCACTCGTAC 4391 

IIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIillllllllllMIMIIIIIIIII 

ATGAGCCAGCCTGATGCCGACAGGTCTTGTAACCCACCGTGTAGTCAACCCCACTCGTAC 3480 
TGTAGCGAGAC AAATU^CATGCC ATTGTGAAG AAGGGTAC ACTGAAGTCATGTCTTCTAAC 4451 

IIIIIIIIIIIIIIIIIIIIIMIiMIIIIIIMIMIIIIIIllllllllllllllll 

TGTAGCGAGACAAAAACATGCCATTGTGAAGAAGGGT ACACTGAAGTCATGTCTTCTAAC 3540 

AGCACCCTTGAGCAATGC ACACTTATCCCCGTGGTGGTATTACCCACCATGGAGGACAAA 4511 

lllllillllllllllllllllllllllllllllllllllllllllllllllllllllll 
AGCACCCTTGAGCAATGCACACTTATCCCCGTGGTGGTATTACCCACCATGGAGGACAAA 3600 



llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

AGAGGAGATGTGAAAACCAGTCGGGCTGTAC ATCCAACCCAACCCTCCAGTAACCCAGCA 3660 
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4572 GGACGGGGAAGGACCTGGTTTCTACAGCCATTTGGGCCAGATGGGAG ACTAAAGACCTGG 4631 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

3 661 GGACGGGGAAGGACCTGGTTTCTACAGCCATTTGGGCCAGATGGGAGACTAAAGACCTGG 3720 

4632 GTTTACGGTGTAGCAGCTGGGGCATTTGTGTTACTCATCTTTATTGTCTCCATGATTTAT 4691 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIII 

3721 GTTTACGGTGTAGCAGCTGGGGCATTTGTGTTACTCATCTTTATTGTCTCCATGATTTAT 37 80 

4692 CTAGCTTGCAAAAAGCCAAAGAAACCCCAAAGAAGGCAAAACAACCGACTGAAACCTTTA 47 51 

IIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

37 81 CTAGCTTGCAAAAAGCCAAAGAAACCCCAAAGAAGGC7Wy\ACAACCGACTGAAACCTTTA 3 840 

47 52 ACCTTAGCCTATGATGGAGATGCCGACATGTAACATATAACTTTTCCTGGCAACAACCA 4810 

lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
3841 ACCTTAGCCTATGATGGAGATGCCGACATGTAACATATAACTTTTCCTGGCAACAACCA 3899 



SECPll 

A SECPl 1 nucleic acid and polypeptide according to the invention includes the nucleic 

20 acid sequence (SEQ ID NO:42 and encoded polypeptide sequence (SEQ ID NO:43) of clone 

CG508 17-04 directed toward novel peptidase (HPEP-8)-like proteins and nucleic acids encoding 
them. FIG. 16 illustrates the nucleic acid sequence and amino acid sequences. This clone 
includes a nucleotide sequence (SEQ ID NO:42) of 1447 bp. The nucleotide sequence includes 
an open reading frame (ORE) beginning with an ATG initiation codon encoding a polypeptide of 

25 224 amino acid residues (SEQ ID NO:43), The start codon is located at nucleotides 520-522 and 
the stop codon is located at nucleotides 1 192-1 194. Putative untranslated regions, if any, are 
found upstream from the initiation codon and downstream from the termination codon. The 
protein encoded by clone CG508 17-04 is predicted by the PSORT program to localize in the 
cytoplasm with a certainty of 0.4500. The program PSORT and program SignalP predict that the 

30 protein appears to have no amino-terminal signal sequence. 

Novel peptidase (HPEP-8)-like proteins are related to conditions of failure to thrive, 
nutritional edema, and hypoproteinemia with normal sweat electrolytes as reported by Townes et 
al (J. Pediat. 71: 220-224, 1967) for 2 affected male infants. This condition could be treated by a 
protein hydrolysate diet. Morris and Fisher (Am. J. Dis. Child. 1 14: 203-208, 1967) reported an 

35 affected female who also had imperforate anus, a result of a defect in the synthesis of the 

enterokinase which activates proteolytic enzymes produced by the pancreas. Oral pancreatin 
represents a therapeutically successful form of enzyme replacement. Trypsin, like elastase is a 
member of the pancreatic family of serine proteases. MacDonald et al. (J. Biol. Chem. 257: 
9724-9732, 1982) reported nucleotide sequences of cDNAs representing 2 pancreatic rat 

40 trypsinogens. The trypsin gene is on mouse chromosome 6 (Honey et al., Somat. Cell Molec. 

29 



Genet. 10: 369-376, 1984). Carboxypeptidase A and trypsin are a syntenic pair conserved in 
mouse and man. Emi et al. (Gene 41: 305-310, 1986) isolated cDNA clones for 2 major human 
trypsinogen isozymes from a pancreatic cDNA library. The deduced amino acid sequences had 
89% homology and the same number of amino acids (247), including a 15-amino acid signal 
5 peptide and an 8-amino acid activation peptide. Southern blot analysis of human genomic DNA 
with the cloned cDNA as a probe showed that the human trypsinogen genes constitute a family 
of more than 10. The gene encoding trypsin-1 (TRYl) is also referred to as serine protease-1 
(PRSSl). Rowen et al. (Science 272: 1755-1762, 1996) found that there are 8 trypsinogen genes 
embedded in the beta T-cell receptor locus or cluster of genes (TCRB) mapping to 7q35. In the 

10 685-kb DNA segment that they sequenced they found 5 tandemly arrayed 10-kb locus-specific 
repeats (homology units) at the 3-prime end of the locus. These repeats exhibited 90 to 91% 
overall nucleotide similarity, and embedded within each is a trypsinogen gene. Alignment of , 
pancreatic trypsinogen cDNAs with the germline sequences showed that these trypsinogen genes 
contain 5 exons that span approximately 3.6 kb. They denoted 8 trypsinogen genes Tl through 

15 T8 from 5-prime to 3-prime. Some of the trypsinogen genes are expressed in nonpancreatic 

tissues where their function is unknown. Rowen et al. (Science 272: 1755-1762, 1996) noted that 
the intercalation of the trypsinogen genes in the TCRB locus is conserved in mouse and chicken, 
suggesting shared functional or regulatory constraints, as has been postulated for genes in the 
major histocompatibility complex (such as class I, II, and III genes) that share similar long-term 

20 organizational relationships. The gene of invention is a novel serine protease containing a trypsin 
domain but localized on chromosome 16. . 

The sequence of the invention was derived by laboratory cloning of cDNA fragments 
covering the full length and/or part of the DNA sequence of the invention, and/or by in silico 
prediction of the full length and/or part of the DNA sequence of the invention from public human 

25 sequence databases. 

The laboratory cloning was performed using one or more of the methods summarized as: 
SeqCallingTM Technology, where cDNA was derived from various human samples representing 
multiple tissue types, normal and diseased states, physiological states, and developmental states 
from different donors. Samples were obtained as whole tissue, cell lines, primary cells or tissue 

30 cultured primary cells and cell lines. Cells and cell lines may have been treated with biological or 
chemical agents that regulate gene expression for example, growth factors, chemokines, steroids. 
The cDNA thus derived was then sequenced using CuraGen's proprietary SeqCalling technology. 
Sequence traces were evaluated manually and edited for corrections if appropriate. cDNA 
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sequences from all samples were assembled with themselves and with public ESTs using 
bioinformatics programs to generate CuraGen's human SeqCalling database of SeqCalling 
assemblies. Each assembly contains one or more overlapping cDNA sequences derived from one 
or more human samples. Fragments and ESTs were included as components for an assembly 
5 when the extent of identity with another component of the assembly was at least 95% over 50 bp. 
Each assembly can represent a gene and/or its variants such as splice forms and/or single 
nucleotide polymorphisms (SNPs) and their combinations. 

Exon Linking, where the cDNA coding for the sequence was cloned by polymerase chain 
reaction (PGR) using the following primers: 5'GTGCTGACCAAGAGAGCTGGTGAG3' (SEQ 

10 ID NO: 113) and 5'GACAGGGGCAGTAATGCGATTTGC3' (SEQ ED NO: 102) on the 

following pools of human cDNAs: Pool 1 - Adrenal gland, bone marrow, brain - amygdala, brain 
- cerebellum, brain - hippocampus, brain - substantia nigra, brain - thalamus, brain -whole, fetal 
brain, fetal kidney, fetal liver, fetal lung, heart, kidney, lymphoma - Raji, mammary gland, 
pancreas, pituitary gland, placenta, prostate, salivary gland, skeletal muscle, small intestine, 

15 spinal cord, spleen, stomach, testis, thyroid, trachea, uterus. 

Primers were designed based on in silico predictions for the full length or part (one or 
more exons) of the DNA/protein sequence of the invention or by translated homology of the 
predicted exons to closely related human sequences or to sequences from other species. Usually 
multiple clones were sequenced to derive the sequence which was then assembled similar to the 
20 SeqGalling process. In addition, sequence traces were evaluated manually and edited for 
corrections if appropriate. 

Variant sequences are also included in this application. A variant sequence can include a 
single nucleotide polymorphism (SNP). A SNP can, in some instances, be referred to as a 
"cSNP" to denote that the nucleotide sequence containing the SNP originates as a cDNA. A SNP 

25 can arise in several ways. For example, a SNP may be due to a substitution of one nucleotide for 
another at the polymorphic site. Such a substitution can be either a transition or a transversion. A 
SNP can also arise from a deletion of a nucleotide or an insertion of a nucleotide, relative to a 
reference allele. In this case, the polymorphic site is a site at which one allele bears a gap with 
respect to a particular nucleotide in another allele. SNPs occurring within genes may result in an 

30 alteration of the amino acid encoded by the gene at the position of the SNP. Intragenic SNPs may 
also be silent, however, in the case that a codon including a SNP encodes the same amino acid as 
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a result of the redundancy of the genetic code. SNPs occurring outside the region of a gene, or in 
an intron within a gene, do not result in changes in any amino acid sequence of a protein but may 
result in altered regulation of the expression pattern for example, alteration in temporal 
expression, physiological response regulation, cell type expression regulation, intensity of 
5 expression, stability of transcribed message. 

The DNA sequence and protein sequence for a novel Peptidase (HPEP-8)-like gene or 
one of its splice forms was obtained solely by exon linking and is reported here as CuraGen Acc. 
No. CG50817-04. 

Real-time expression analysis was performed on SECPll (clone CG508 17-04). The 
10 results demonstrate that RNA homologous to this clone is present in multiple tissue and cell 
types. 

Accordingly, SECPl 1, nucleic acids according to the invention can be used to identify 
one or more of these tissue types. The presence of RNA sequences homologous to a SECPl 1 
nucleic acid in a sample indicates that the sample contains one or more of the above-tissue types. 

15 In a search of sequence databases, it was found, for example, that the nucleic acid 

sequence of this invention has 1086 of 1087 bases (99%) identical to a hiaman peptidase, 
HPEP-8 mRNA (patn: A37664. The full amino acid sequence of the protein of the invention was 
found to have 254 of 255 amino acid residues (99%) identical to, and 254 of 257 amino acid 
residues (99%) similar to, the 571 amino acid residue ptnr: patp:Y4i704 Human pro351 

20 protein sequence from Homo sapiens. 

The presence of identifiable domains in the protein disclosed herein was determined by 
searches using algorithms such as PROSITE, Blocks, Pfam, ProDomain, Prints and then 
determining the Interpro number by crossing the domain match (or numbers) using the Interpro 
website. The results indicate that this protein contains the following protein domains (as defined 
25 by Interpro) at the indicated positions: domain name trypsin at amino acid positions 15 to 179. 
This indicates that the sequence of the invention has properties similar to those of other proteins 
known to contain this/these domain(s) and similar to the properties of these domains. 

Chromosomal information: 

The Peptidase (HPEP-8) disclosed in this invention maps to chromosome 16. This 
30 information was assigned using OMIM, the electronic northern bioinformatic tool implemented 
by CuraGen Corporation, public ESTs, public literature references and/or genomic clone 
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homologies. This was executed to derive the chromosomal mapping of the SeqCalling 
assemblies. Genomic clones, literature references and/or EST sequences that were included in 
the invention. 

Tissue expression 

5 The Peptidase (HPEP-8) disclosed in this invention is expressed in at least the following 

tissues: Adrenal gland, bone marrow, brain - amygdala, brain - cerebellum, brain - hippocampus, 
brain - substantia nigra, brain - thalamus, brain -whole, fetal brain, fetal kidney, fetal liver, fetal 
lung, heart, kidney, lymphoma - Raji, mammary gland, pancreas, pituitary gland, placenta, 
prostate, salivary gland, skeletal muscle, small intestine, spinal cord, spleen, stomach, testis, 
10 thyroid, trachea, uterus. This information was derived by determining the tissue sources of the 
sequences that were included in the invention including but not limited to SeqCalling sources. 
Public EST sources, and/or RACE sources. 

Cellular Localization and Sorting 

The SignalP, Psort and/or Hydropathy profile for the Peptidase (HPEP-8)-like protein are 
15 shown in Table 7. The results predict that this sequence has no signal peptide and is likely to be 
localized in the cytoplasm with a certainty of 0.4500 predicted by PSORT. 

The proteins of the invention encoded by clone CG50817-04 include the protein 
disclosed as being encoded by the ORE described herein, as well as any mature protein arising 
therefrom as a result of post-translational modifications. Thus, the proteins of the invention 
20 encompass both a precursor and any active forms of the clone CG508 17-04 protein. 

Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Peptidase (HPEP-8)-like protein 
includes the nucleic acid whose sequence is provided in Figure 16, or a fragment thereof. The 
invention also includes a mutant or variant nucleic acid any of whose bases may be changed 

25 from the corresponding base while still encoding a protein that maintains its Peptidase (HPEP- 
8)-like activities and physiological functions, or a fragment of such a nucleic acid. The invention 
further includes nucleic acids whose sequences are complementary to those just described, 
including nucleic acid fragments that are complementary to any of the nucleic acids just 
described. The invention additionally includes nucleic acids or nucleic acid fragments, or 

30 complements thereto, whose structures include chemical modifications. Such modifications 
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include, by way of non-limiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least in 
part to enhance the chemical stability of the modified nucleic acid, such that they may be used, 
for example, as antisense binding nucleic acids in therapeutic applications in a subject. In the 
5 mutant or variant nucleic acids, and their complements, up to 1% of the residues may be so 
changed. 

The novel protein of the invention includes the Peptidase (HPEP-8)-like protein whose 
sequence is provided in Figure 16. The invention also includes a mutant or variant protein any of 
whose residues may be changed from the corresponding residue shown in Figure 16 while still 
10 encoding a protein that maintains its Peptidase (HPEP-8)-like activities and physiological 

functions, or a functional fragment thereof. In the mutant or variant protein, up to about 1% of 
the bases may be so changed. 

Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
15 (Fab)2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier partcle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

20 Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, and map location for the Peptidase 
(HPEP-8)-like protein and nucleic acid disclosed herein suggest that this Peptidase (HPEP-8) 
may have important structural and/or physiological functions characteristic of the Serine protease 
family. Therefore, the nucleic acids and proteins of the invention are useful in potential 
25 diagnostic and therapeutic applications and as a research tool. These include serving as a specific 
or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the presence or 
amount of the nucleic acid or the protein are to be assessed, as well as potential therapeutic 
applications such as the following: (i) a protein therapeutic, (ii) a small molecule drug target, (iii) 
an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), (iv) a nucleic acid 
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useful in gene therapy (gene delivery/gene ablation), and (v) a composition promoting tissue 
regeneration in vitro and in vivo (vi) biological defense weapon. 

The nucleic acids and proteins of the invention are useful in potential diagnostic and 
therapeutic applications implicated in various diseases and disorders described below and/or 
5 other pathologies. For example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from: cell proliferative disorder; arteriosclerosis; psoriasis; 
myelofibrosis; cancer; autoimmune disorder; Crohn's disease; inflammatory disorder; ADDS; 
anaemia; allergy; asthma; atherosclerosis; Grave's disease; multiple sclerosis; scleroderma; 
infection; diabetes; metabolic disorder; Addison's disease; cystic fibrosis; glycogen storage 
10 disease; obesity; nutritional edema, hypoproteinemia and other diseases, disorders and conditions 
of the like. 

These materials are further useful in the generation of antibodies that bind 
immunospecifically to the novel substances of the invention for use in therapeutic or diagnostic 
methods. 

15 Table 4. BLASTN identity search for the nucleic acid of the invention versus GenBank. 

>patn:A37664 Human peptidase, HPEP-8 coding sequence - Homo sapiens, 1661 bp. (seq id no : 60) , 
Length = 1661 
20 Plus Strand hsps: 
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Score = 5426 (814.1 bits). Expect = 5.1e-240, P = 5.1e-240 

Identities = 1086/1087 (99%), Positives = 1086/1087 (99%). Strand = Plus / Plus 

3ACACCAGTGATGCTCCTGGGACCCTACGCAATCTGCGCCTGCGTCTCATCAGTCGCCC 6 2 

lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

1 GGACACCAGTGATGCTCCTGGGACCCTACGCAATCTGCGCCTGCGTCTCATCAGTCGCCC 6 0 



Query : 


3 


Sbjct : 


1 


Query : 


63 


Sbjct: 


61 


Query: 


123 


Sbjct: 


121 


Query: 


183 


Sbjct: 


181 


Query: 


243 


Sbjct: 


241 


Query: 


303 


Sbjct: 


301 



30 II llllllillll INI I Mill I MM I Mill III II mill Mill MM 1 1 Mil 

CACATGTAACTGTATCTACAACCAGCTGCACCAGCGACACCTGTCCAACCCGGCCCGGCC 
rGGGATGCTATGTGGGGGCCCCCAGCCTGGGGTGCAGGGCCCCTGTCAGGTCTGATAGGG 

I I III Mill llllllll II III III II II I III Ml lllll I II III II III II II III 

rGGGATGCTATGTGGGGGCCCCCAGCCTGGGGTGCAGGGCCCCTGTCAGGTCTGATAGGG 
AGAAGAGAAGGAGCAGAAGGGGA(5GGGCCTAACCCTGGGCTGG(3GGTTGGACTCACAGGA 

I IIIIIIIIIIIIIIIIMIIII lllll I lllll lllllllll Mill II III II lllll 

AGAAGAGAAGGAGCAGAAGGGGAGGGGCCTAACCCTGGGCTGGGGGTTGGACTCACAGGA 
CTGGGGGAAAGAGCTGCAATCAGAGGGTGTCTGCCATAGCTGGGCTCAGGCATCTGTCCT 

I II MM MM I lllll llllll lllll I III I MM II MM II III 1 1 III II MM I 

::tgggggaaagagctgcaatcagagggtgtctc3ccatagctgggctcaggcatctgtcct 

rggctttgttgcctggctccagggagattccgggggccctgtgctgtgcctcgagcctga 
II lllllllll II lllllllll llllll llllll III II MM llllllllll II III II 

rGGCTTTGTTGCCTGGCTCCAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGA 
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Queiry : 


363 


Sbjct : 


361 


Query: 


423 


Sbjct: 


421 


Query: 


483 


Sbjct: 


481 


Query: 


543 


Sbjct: 


541 


Query: 


603 


Sbjct : 


601 


Query: 


663 


Sbjct : 


661 


Query: 


723 


Sbjct : 


721 


Query : 


783 


Sbjct : 


781 


Query : 


843 


Sbjct : 


841 


Query : 


903 


Sbjct : 


901 


Query : 


963 


Sbjct: 


961 


Query: 


1023 


Sbjct: 


1021 


Query: 


1083 ' 


Sbjct : 


1081 ' 


Score 


= 1931 



CGGACACTGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGC 422 

llllilllllllllllllllllllllllllllllllllllllllllllllllllllllll 
CGGAC ACTGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGC 420 

TCCTGTGCTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGG 482 

I M I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TCCTGTGCTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGG 480 

GGCAGCTTTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGT 542 

IIIIIIIIMIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIllllllllllllllll 

GGCAGCTTTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGT 540 

543 AGCCTGTGGATCCTTGAGGAC AGCAGGTCCCCAGGCAGGAGC ACCCTCCCC ATGGCCCTG 602 
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



IIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 



GGCGGTGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGT 722 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIII 
GGCGGTGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCC AGAGGAATGGAGCGT 720 

AGGGCTGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTA 782 

i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

AGGGCTGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTA 780 

CACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCC AGCCTGTGACACT 842 

IIIIIIIIIIIIMIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CACCC ACCCTGAGGGGGGCTACGAC ATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACT 840 

GGGAGCCAGCCTGCGGCCCCTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGA 902 

IMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIII 

GGGAGCCAGCCTGCGGCCCCTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGA 900 

GCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGT 962 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

GCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGT 960 

GCCCGTGACCCTCCTGGGGCCTAGGGCCTGC AGCCGGCTGCATGCAGCTCCTGGGGGTGA 1022 

IIIIIIIIIIIIIIIIIIIIIIIIIMIIIUIMIIIIIIIIIIIIIIIIIIIIIIIII 

GCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGA 1020 



MIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 



MM 



39.7 bits). Expect = 3.7e-82, P = 3.7e-82 
Identities = 635/848 (74%), Positives = 635/848 (74%), Strand = Plus / Plus 



Query : 


600 


Sbjct: 


818 


Query: 


657 


Sbjct: 


874 


Query: 


713 


Sbjct: 


934 



I nil III II II I INI II iiiii III 



II 



656 



I II 



I I II 



II III I 



II MM 



n i I IIIII I III 



I II 



iiiiii 



II 



III 



M 



-CCT 770 
III 



Query: 771 GCATGGAGCCTACACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCA 830 
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III I I II II nil Mill I I M Mill I MM II 

Sbjct: 990 GCA-GCCGGCTGCATGCAGC-TCCTGGGGGTGATGGCA — GCCCTATT-CTGCCGGGGAT 1044 
Query: 831 GCCTGTG-ACACTGGGA-GCCAGCCTGCGGCCCCTCTGCCTGC-CCTATGCTGAC-CACC 886 

I INI II I I I I I II MM III I III I III MM 

Sbjct: 1045 GG-TGTGTAC-CAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGGCCTGT-CTGGGGCACC 1101 
Query: 887 AGO — TGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCAT 944 

II Ml Ml II III I I IIIII M Mill II Ml I I II 

Sbjct: '1102 ACTGGTGCATGA-GGTGAGGGGCACATGGTTCCTGGCCGGGCT-GCACAGCTTCGGAGAT 1159 
Query: 945 -CA-GCTCCCTCCA-GACAGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCT 1001 

III II I III III I I I II I II Nil I I II 

Sb j c t : 1160 GCTTGCC AAGGCCCCGCCAG-GCCGGCGGTCTTCACCGCGCTCCCTGCCTAT-GAGGACT 1217 
Query : 1002 GCATGCAGCTCCTGGGGGTGATGGCAGCCCTA-TTCTGCCGGGGATGGTGTGTACCAGTG 10 60 

I I MM Ml II I III Ml Ml MM Ml I I III I 

Sb j c t : 1218 GGGT- CAGCAGTTTGGACTG — G-CAGGTCTACTTC -GCCGAGGAACCAGAGCCCGAG-G 1271 
Query : 1061 CTGTGGGTG - A-GCTGCCCAGCTGTGAG — GCCAACCAACCAGCTGCTGACAGGGGACCT 1116 

III I II I IIIII! II I llllllllllllllllllllllllllllll 

Sb j c t : 1272 CTGAGCCTGGAAGCTGCCTGGCCAACATAAGCCAACCAACCAGCTGCTGACAGGGGACCT 1331 
Query: 1117 GGCCATTCTCAGGAACAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTCC 1176 

MMIMMMMI MIIIIIIIMIIMIIIIIIIMMMIIIIMIIMIIIIMI 

Sb j c t : 1332 GGCCATTCTCAGGA- C AAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTCC 1390 
Query: 1177 CCACCCTGTCATGTGTGATTCCAGGCACCAGGGCAGGCCCAGAAGCCCAGCAGCTGTGGG 1236 

IIIIIMIIIMIIIIII IIMIIM IIIIIIIIIIIIMIIIMIIMIIIIIMIIM 

Sb j c t : 1391 CC ACCCTGTC ATGTGTGATTCCAGGC ACC AGGGCAGGCCCAGAAGCCCAGCAGCTGTGGG 1450 
Query: 1237 AAGGAACCTGCCTGGGGCCACAGGTGCCCACTCCCCACCCTGCAGGACAGGGGTGTCTGT 1296 

IMIIIIIIIIIIMIMIIIIIMIIIIIIIIIIIIIIIIIMIIMIIIIIMIIMI 

Sb j C t : 1451 AAGGAACCTGCCTGGGGCCACAGGTGCCCACTCCCCACCCTGCAGGACAGGGGTGTCTGT 1510 
Query : 1297 GGACACTCCCAC ACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTAC 1356 

MMIMMMMMMIMMMM IMIIMIIIIMIIIIIIIIIIIIIMIIIIII 

Sb j C t : 1511 GGACACTCCCACACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTAC 1570 
Query: 1357 CCTTTCAGATACAATCACGCCAGCCACGTTGTTTTGAAAATTTCTTTTTTTGGGGGGCAG 1416 

Mill IIMIIM Mill II MM II II III III III Mill MM MM MM III I II 

Sb j Ct : 1571 CCTTTCAGATACAATCACGCCAGCCACGTTGTTTTGAAAATTTCTTTTTTTGGGGGGCAG 1630 
Query: 1417 CAGTTTTCCTTTTTTTAAACTTAAATAAATT 1447 

MMIMMMMMIMMMMMIMM 

Sbjct: .1631 CAGTTTTCCTTTTTTTAAACTTAAATAAATT 1661 



Table 5. BLASTP identity search for the protein of the invention versus Non- 
Redundant Composite and GenSeq for the Peptidase (HPEP-8)-like protein of the 
invention. 

50 >patp:Y41704 Hximan PR0351 protein sequence - Homo sapiens, 571 aa. (seq id 

NO: 61) 

Length = 571 
55 Plus Strand HSPs : 



Score = 1372 (483.0 bits). Expect = 1.5e-170, Sum P(2) = 1.5e-170 
Identities = 254/255 (99%), Positives = 254/255 (99%), Frame = +1 

60 Query: 322 QGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTNTAAHSSWLQARVQGAAFLAQ 501 

MMMMMMMMMMMMMMMMMMMMMIMMMMMMMMI 

Sbjct: 239 QGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTNTAAHSSWLQARVQGAAFLAQ 298 
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Query : 
Sbjct : 
Query : 
Sbjct: 
Query: 
Sbjct: 



502 SPETPEMSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAA 681 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIMIi 

299 S PETPEMSDEDSCVACGSLRTAGPQAGAPS PWPWEARLMHQGQLACGGALVSEEAVLTAA 358 



682 



861 



HCFIGRQAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRP 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMMIIillllllllMIIII 

359 HCFIGRQAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRP 418 

862 LCLPYADHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILP 1041 

INN IIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIilllllll 

419 LCLPYPDHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILP 47 8 



Query: 1042 GMVCTSAVGELPSCE 1086 

IMIIIIIIIIIIII . 
Sbjct: 479 GMVCTSAVGELPSCE 493 

Score = 315 (110.9 bits), Expect = 1.5e-170, Sum P(2) = 1.5e-170 
Identities = 56/56 (100%), Positives = 56/56 (100%), Frame = +1 



20 Query: 



4 DTSDAPGTLRNLRLRLISRPTCNCIYNQLHQRHLSNPARPGMLCGGPQPGVQGPCQ 171 

IIIIIMIIIIIIIIilllllllllllllllllllllllllllllllllMIMII 

Sb j c t : 184 DTSDAPGTLRNLRLRLISRPTCNCIYNQLHQRHLSNPARPGMLCGGPQPGVQGPCQ 239 

Score = 225 (79.2 bits). Expect = 8.7e-15, P = 8.7e-15 
Identities = 71/203 (34%), Positives = 95/203 (46%), Frame = +1 



Query 
Sbjct 

Query 
Sbjct 
Query 
Sbjct 
Query 
Sbjct 



586 PS PWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGRQAPE — EWSVGLGT RP 741 

I IIKI + II I I + II++ llllllll I I III 11+ 1 

63 PGEWPWQASVRRQGAHICSGSLVADTWVLTAAHCFEKAAATELNSWSWLGSLQREGLSP 122 

742 — EEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYADHHLPDGERGWV 915 

111+ II II I I l+lll II I I Mill III I 

123 GAEEVGVAALQLPRAYNHYSQGSDLALLQLAHPTTH TPLCLPQPAHRFPFGASCWA 178 

916 LGRARPGAGI-SSLQTVPVTLLGPRACS RLHAAPGGDGSPILPGMVCTSAVGELPS 1080 

I + + +1+ + + 1+ 1+ +11 +1 lll+l I I 

17 9 TGWDQDTSDAPGTLRNLRLRLISRPTCNCIYNQLHQRHLSN--PARPGMLCG GPQPG 233 

1081 CEANQPAADRGPGHSQEQENAGRQMALLPLSS 117 6 

II I + I ++ +1 

234 VQGPCQGDSGGPVLCLEPDGHWVQAGIISFAS 2 65 



Score = 102 (35.9 bits). Expect = 7.2e-32, Sum P{2) = 7.2e-32 
Identities = 27/84 (32%), Positives = 42/84 (50%), Frame = +1 

Query: 295 SVLGFVAWLQGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTNTAAHSSWLQT^ 474 

I +1 + +1 II I I I I I 11+ II +1 I + I 1+ 1+ + 

Sbjct: 484 SAVGELPSCEGLSGAP-LVHEVRGTWFLAGLHSFGDACQGPARPAVFTALPAYEDWVSS- 541 

50 Query: 475 VQGAAFLAQSPETPEMSDEDSCVA 546 

+ 1+ II II ++ ll+l 
Sbjct: 542 LDWQVYFAEEPE-PE-AEPGSCLA 563 



Table 6. BLASTN identity search (versus the human SeqCalling database for the Peptidase (HP£P-8)-like protein of the 
invention. 

>s3aq:132854740 Category D: 12 frag (12 non-5 ' sig-CG) , 636 bp. (SEQ id NO: 62) 
Length = 636 

Minus Strand HSPs: 

Score = 1423 (213.5 bits). Expect = 7.0e~59, P = 7.0e-59 

Identities = 313/343 (91%), Positives = 313/343 (91%), Strand = Minus / Plus 
Query: 1001 AGCCGGCTGCAG-GCCCTAGGCCCCAGGAGGGTCACGGGCACTGTCTGGAGGGAGCTGAT 943 

III mill I III I III II II II III I I I I I 

Sb j c t : 295 AGCTGGCTGCCCCGGCCT-GCAGGTTGGATGGACAGCAGCCCTGGCCCT-GTGCCCACCT 352 



Query : 942 GCCTGCTCCTGGGCGGGCCCGTCCCAGAACCCAGCCACGCTCCCC ATCAGGCAGGTGGTG 
lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
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Sbjct: 


353 


Query : 


882 


Sbjct: 


413 


Query: 


822 


Sbjct : 


473 


Query: 


762 


•Sbjct: 


533 


Query : 


702 


Sbjct: 


593 


Score 


= 757 


Identities = 


Query: 


1116 


Sbjct: 


105 



353 ACCTGCTCCTGGGCGGGCCCGTCCCAGAACCCAGCCACGCTCCCCATCAGGCAGGTGGTG 412 

3TCAGCATAGGGCAGGCAGAGGGGCCGCAGGCTGGCTCCCAGTGTCACAGGCTGGGCCAG 823 

mil iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 



-1 1 1 II I II II III 1 1 



IIIIIIIIIIMIIIIIIIIIIIIIIIIINIIIIIIIIII 

CAGCAGGAGGGCCATGTCGTAGCCCCCCTCAGGGTGGGTGT 

CTGCTTCAGGCCCCACTCCTCCGGTCTGGTCCCCAGCCCTAi 

IIIIIIIMIIIilllllMIIIMIIIIIIIIIIIIIIIMIMIIIIIIIIIIIIIII " 

CTGCTTCAGGCCCCACTCCTCCGGTCTGGTCCCCAGCCCTACGCTCCATTCCTCTGGGGC 592 

CTGGCGCCCAATGAAGCAGTGGGCAGCAGTTAGCACCGCCTCCT 659 

IIIINIIIIMIIIIIMIIIIIIIIMIillNIIMIIIII 

CTGGCGCCCAATGAAGCAGTGGGCAGCAGTTAGCACCGCCTCCT 63 6 

(113.6 bits). Expect = 1.7e-28, P = i.7e-28 (seq id no s 103) 
165/179 (92%), Positives = 165/179 (92%), Scrand = Minus / Plus 

AGGTCCCCTGTCAGCAGCTGGTTGGTTGGCCTCACAGCTGGGCAGCTCACCCACAGCACT lOS^ 

llli IN I MM M I MMMMMMMMMMMMMMMI 



Query: 1056 GGTACACACCATCCCCGGCAGAATAGGGCTGCCATCACCCCCAGGAGCTGCATGCAGCCG 997 

IIIIIINMMMMMMMMMMMMIMMMMMMIMMMIMMMI 

Sbjct: 163 GGTACACACCATCCCCGGCAGAATAGGGCTGCCATCACCCCCAGGAGCTGCATGCAGCCG 222 
Query: 996 GCTGCAGGCCCTAGGCCCCAGGAGGGTCACGGGCACTGTCTGGAGGGAGCTGATGCCTG 938 

_ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M M M M M M M M M M M M M M M I M M M M M M M 

Sbjct: 223 GCTGCAGGCCCTAGGCCCCAGGAGGGTCACGGGCACTGTCTGGAGGGAGCTGATGCCTG 281 

>s3aq:134913963 Category E: 1 frag (1 non-CG EST), 415 bp. 
Length = 415 (seq id no:104) 

Plus Strand HSPs : 



Score = 297 (44.6 bits), Expect = l.le-06, P = l.le-06 
Identities = 61/63 (96%), Positives = 61/63 (96%), Snrand 



Plus / Plus 



Query: 1385 TTGTTTTGAAAATTTCTTTTTTTGGGGGGCAGCAGTTTTCCTTTTTTTAAACTTAAATAA 1444 

IN I MMMIMMMMMIMMMMIMIMMIMMMMMMMMMI 

Sbjct: 10 TTGGTGTGAAAATTTCTTTTTTTGGGGGGCAGCAGTTTTCCTTTTTTTAAACTTAAATAA 69 



Query : 
Sbjct: 



1445 ATT 1447 

III 

70 ATT 72 
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Table 7. ClustalW alignment of the protein of the invention with similar peptidase 
(HPEP-8)s. 

ClustalW alignment of the protein of the invention. 

CO50S17-04 

Y41704 MLLSS LV S LAGSVY LAWI LF F VLYDFC 1 VC I T TYAI N VS LMWLS F RKV QEPQGKAKR HON 
Y90291 

CG50817-04 - - 

Y41704 TVPG EWPWQASVRRQGAH I C 3 GS LVADT WV L T AAHC F EKAA A TE LNS WSVVLOS LQR EGL 
Y90291 - 

CO50817.04 - 

Y41704 SPOAE EVGVAALQLPRAYNHYSQGSDLA L LQ LAHPTTHTP L C LP QPAHRFPFOASCWATO 
Y90291 - 

CG50817-04 

Y41704 WDQDT SDAPGTLRN LRLRLI SRPTCNC I Y NQ LHQRHL SNP A R PGMLC GGPQPGVQG P CQG 
Y90291 - - - - - - 

CG50817-04 - - - - . 

Y41704 DSGGP VLCLEPDGHWVQAGI I SFASSCA Q EDA P VLLTNTAAHSSWLQARVQGAAFLA QSP 
Y90291 - 



CG50817-04 

Y41704 

Y90291 


ETPE 


M 3 D ED S CVAC G 3 L RTAG PQAGAP S P WP V(/EARLMH< 
MSDED3CVACGS LRTAGPQAGAP3 P VVPWEARLMB 
MS D ED 3 CVAC G 3 L RTAG PQAGAP 3 PWPWEARLMH-: 


3GQ 
DGO 


LACGGA 
LACGGA 
L AC G GA 


LVSEEA 
LVSEEA 
LVSEEA 


VLTA AHC 
VLTA AHC 
VLTA AHC 
















CG50817-04 1 
Y41704 
Y90291 1 


F I G R Q A P E E W3 V GL G T R P E E WGLKQL I L H G A Y TH P EGG" 
F IGR Q A P E EWSVGL G T R P EEWGLKQL I L H G A Y THP EGG" 
F I G R 0 A P E E WS V GL G T R P E E WGLfCOL I L H G A Y TH P E GG' 


YDIvI 
YDM 
V'DM 


ALL LLA 
ALL L LA 
ALL LLA 


OPVTLG 
OPVTLG 
OPVTLG 


ASLR P LC 
A3LR PLC 
ASLR PLC 



CG50817.04 


L 


PYA D 


HH 


L 


P 


D 


GERGW 


^V L 


GRARPGAG] 


[ 3 S 


L 


Q T V 


P 


V 


TLLGP 


RA( 


■:3R 


L. 


HAAPG 


ODGSP 


I 


L 


P GM 


Y41704 


L 


pyHd 


HH 


L 


P 


D 


GERGW 


'V L 


GRARPGAG] 


[ 3 3 


L. 


OT V 


P 


V 


TLLGP 


RA< 


SR 


L 


HAA PG 


GDGSP 


I 


L 


P GM 


Y90291 


L 


PYAD 


HH 


L 


P 


D 


GERGW 


L 


GRARPGAG] 


[33 


L 


0 T V 


P 


V 


TLLGP 


RAk 


:^ 3R 


L 


HAAPG 


GDGSF 


I 


L 


PGM 



CO50817-04 

Y41704 

YP0291 





1 3 LDVVQ 
i s LDWQ 


V Y F A E E 

V Y F A E E 


PEP 
PEP 


E 
E 


AEPGS 
AEPGS 


J C LAN I.^ 
> C LAN I T: 


;op T SCI 
;cjp T sci 



Information for the ClustalW proteins: 



Accno 

CG508 17-04 (SEQ id N0:43) 
Y41704 (SEQIDNO:122) 
Y90291 (SEQIDNO:123) 



Common Name Length 

novel Peptidase (HPEP-8)-like protein 

Human PR0351 protein sequence. 571 

Human peptidase, HPEP-8 protein sequence. 267 



In the alignment shown above, black outlined amino acid residues indicate regions of 
conserved sequence (i.e., regions that may be required to preserve structural or functional 
properties); greyed amino acid residues can be mutated to a residue with comparable steric 
10 and/or chemical properties without altering protein structure or function (e.g. L to V, I, or M); 

40 



non-highlighted amino acid residues can potentially be mutated to a much broader extent without 
altering structure or function. Psort, Signal? and hydropathy results for the Peptidase (HPEP-8)- 
like protein of the invention. 

Table 8. Psort, Signal P and Pfam Results for CG508 17-04, Peptidase (HPEP-8)-like 
5 Protein. 



PSORT data: 

cytoplasm — Certainty=0.4500(Affirmatlve) < suco 
microbody (peroxisome) — Certainty=0.3000(Affimiative) < suco 
10 lysosome (lumen) — Certalnty=0.2415( Affirmative) < suco 

mrtocnondrial matrix space — Cenainty=0.1000( Affirmative) < suco 

Signal P data: 

# Measure Position Value Cutoff Conclusion 
15 max. C 57 0.130 0.37 NO 
max. Y 55 0.066 0.34 NO 
max. S 32 0.311 0.88 NO 
means 1-54 0.142 0.48 NO 

20 PFAM data: 

Scores for sequence family classification (score Includes all domains): 
Model Description Score E-value N 



25 



trypsin Trypsin 69.7 2.7e-21 1 



SECP12 

A SECP12 nucleic acid and polypeptide according to the invention includes the 
nucleic acid sequence (SEQ ID NO:44) and encoded polypeptide sequence (SEQ ID NO:45) of 

30 clone CG508 17-05 directed toward novel peptidase (HPEP-8)-like proteins and nucleic acids 
encoding them. This is a related variant of SECPl 1, clone CG50817-04. Figure 17 illustrates 
the nucleic acid sequence and amino acid sequences respectively. This clone includes a 
nucleotide sequence (SEQ ID NO:44) of 1592 bp. The nucleotide sequence includes an open 
reading frame (ORF) beginning with an ATG initiation codon at nucleotides 19-21 and ending 

35 with a TGA codon at nucleotides 1582-1584. The encoded protein having 521 amino acid 
residues is presented using the one-letter code in Figure 17. 

The protein encoded by clone CG508 17-05 is predicted by the PSORT program to 
localize in the plasma membrane with a certainty of 0.6850, and appears to be a signal protein 
(see Table 13 below). 



41 



1. IIJI Si'J y ft If li B iHCP 



The sequence identified by exon linking was extended in silico using information from at 
least some of the following sources: SeqCalling assemblies 153687026, 152507187, 153485867, 
153485864 and genomic clone gb_AC009088.5 . 

The genomic clone was analyzed by Genscan, Grail and/or other programs to identify 
5 regions that were putative exons, i.e., putantive coding sequences. The clone was also analyzed 
by TBLASTN, TFASTN, TFASTA, BLASTX and/or other programs, i.e., hybrid to identify 
genomic regions translating to proteins with similarity to the original protein or protein family of 
interest. The following genomic sequence was thus included in the invention: gb_AC009088.5 . 

The DNA sequence and protein sequence for a novel Peptidase-iike gene or one of its 
10 splice forms thus derived is reported here as the invention CG508 17-05. Genomic clones having 
regions with 100% identity to the extended sequence thus obtained were identified by BLASTN 
searches with the extended sequence against human genomic databases. The genomic clone was 
selected for further analysis because this identity indicates that these clones contain the genomic 
locus for these SeqCalling assemblies. 

15 The regions defined by all approaches were then manually integrated and manually 

corrected for apparent inconsistencies that may have arisen, for example, from miscalled bases in 
the original fragments used, or from discrepancies between predicted homolgy to a protein of 
similarity to derive the final sequence of the invention CG508 17-05 reported here. When 
necessary, the process to identify and analyze SeqCalling assemblies, ESTs and genomic clones 

20 was reiterated to derive the full length sequence. 

Similarities 

In a search of sequence databases, it was found, for example, that the nucleic acid 
sequence of this invention has 1 135 of 1 140 bases (99%) identical to a gb:GENBANK-ID: 
Z34002 hiiman PR0351 nucleotide sequence mRN A from Homo (Table 9). The full amino 
25 acid sequence of the protein of the invention was found to have 476 of 493 amino acid residues 
(96%) identical to, and 479 of 493 amino acid residues (97%) similar to, the 571 amino acid 
residue pa tp:Y4 17 04 human PR0351 protein from Homo sapiens (Table 10). 
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A multiple sequence alignment is given in Table 12, with the protein of the invention 
being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
related protein sequences. 

The presence of identifiable domains in the protein disclosed herein was determined by 
5 searches using algorithms such as PROSITE, Blocks, Pfam, ProDomain, Prints and then 

determining the Interpro number by crossing the domain match (or numbers) using the Interpro 
website. The results indicate that this protein contains the following protein domains (as defined 
by Interpro) at the indicated positions: domain name trypsin at amino acid positions 61 to 279, 
and 312 to 476. This indicates that the sequence of the invention has properties similar to those 
lb of other proteins known to contain this/these domain(s) and similar to the properties of these 
domains. 

Chromosomal information: 

The Peptidase disclosed in this invention maps to chromosome 16. This information was 
assigned using OMIM, the electronic northern bioinformatic tool implemented by CuraGen 
15 Corporation, public ESTs, public literature references and/or genomic clone homologies. This 
was executed to derive the chromosomal mapping of the SeqCalling assemblies. Genomic 
clones, literature references and/or EST sequences that were included in the invention. 

Tissue expression 

The Peptidase disclosed in this invention is expressed in at least the following tissues: 
20 Adrenal gland, bone marrow, brain - amygdala, brain - cerebellum, brain - hippocampus, brain - 
substantia nigra, brain - thalamus, brain -whole, fetal brain, fetal kidney, fetal liver, fetal lung, 
heart, kidney, lymphoma - Raji, mammary gland, pancreas, pituitary gland, placenta, prostate, 
salivary gland, skeletal muscle, small intestine, spinal cord, spleen, stomach, testis, thyroid, 
trachea, uterus. This information was derived by determining the tissue sources of the sequences 
25 that were included in the invention including but not limited to SeqCalling sources. Public EST 
sources, and/or RACE sources. 

Cellular Localization and Sorting 

The SignalP, Psort and/or Hydropathy profile for the Peptidase-iike protein are shown in 
Table 13. The results predict that this sequence has a signal peptide with a cleavage site between 
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positions 35 and 36 and is likely to be localized at the plasma membrane with a certainty of 
0.6850. 

Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Peptidase-like protein includes the 
5 nucleic acid whose sequence is provided in Figure 17, or a fragment thereof. The invention also 
includes a mutant or variant nucleic acid any of whose bases may be changed from the 
corresponding base shown in Figure 17, while still encoding a protein that maintains its 
Peptidase-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 

10 described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 
include, by way of non-limiting example, modified bases, and nucleic acids whose sugar 
phosphate backbones are modified or derivatized. These modifications are carried out at least in 

15 part to enhance the chemical stability of the modified nucleic acid, such that they may be used, 
for example, as antisense binding nucleic acids in therapeutic applications in a subject. In the 
mutant or variant nucleic acids, and their complements, up to about 1% of the residues may be so 
changed. 

The novel protein of the invention includes the Peptidase-like protein whose sequence is 
20 provided in Figure 17. The invention also includes a mutant or variant protein any of whose 

residues may be changed from the corresponding residue shown in Figure 17 while still encoding 
a protein that maintains its Peptidase-like activities and physiological functions, or a functional 
fragment thereof. In the mutant or variant protein, up to about 4% of the bases may be so 
changed. 

25 Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
(Fab)2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
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peptides and polypeptides that are fused to any carrier partcle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

Uses of the Compositions of tlie Invention 

The protein similarity information, expression pattern, and map location for the 
5 Peptidase-like protein and nucleic acid disclosed herein suggest that this Peptidase may have 
important structural and/or physiological functions characteristic of the Serine protease family. 
Therefore, the nucleic acids and proteins of the invention are useful in potential diagnostic and 
therapeutic applications and as a research tool. These include serving as a specific or selective 
nucleic acid or protein diagnostic and/or prognostic marker, wherein the presence or amount of 
10 the nucleic acid or the protein are to be assessed, as well as potential therapeutic applications 

such as the following: (i) a protein therapeutic, (ii) a small molecule drug target, (iii) an antibody 
target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), (iv) a nucleic acid useful in 
gene therapy (gene delivery/gene ablation), and (v) a composition promoting tissue regeneration 
in vitro and in vivo (vi) biological defense weapon. 

15 The nucleic acids and proteins of the invention are useful in potential diagnostic and 

therapeutic applications implicated in various diseases and disorders described below and/or 
other pathologies. For example, the compositions of the present invention will have efficacy for 
treatment of patients suffering from: cell proliferative disorder; arteriosclerosis; psoriasis; 
myelofibrosis; cancer; autoinmiune disorder; Crohn's disease; inflanunatory disorder; AIDS; 

20 anaemia; allergy; asthma; atherosclerosis; Grave's disease; niultiple sclerosis; scleroderma; 
infection; diabetes; metabolic disorder; Addison's disease; cystic fibrosis; glycogen storage 
disease; obesity; nutritional edema, h)^oproteinemia and other diseases, disorders and conditions 
of the like. 

These materials are further useful in the generation of antibodies that bind 
25 immunospecifically to the novel substances of the invention for use in therapeutic or diagnostic 
methods. 

Table 9. BLASTN identity search for the nucleic acid of the invention. 

30 >patn: Z34002 Human PR0351 nucleotide sec[uence - Homo sapiens, 2365 bp. (seq id 

NO: 63) 
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Length = 2365 

Plus Strand HSPs : 

Score = 5649 (847.6 bits). Expect = 4.3e-288, Sum P(2) = 4.3e-288 
Identities = 1135/1140 (99%), Positives = 1135/1140 (99%), Strand 



Plus / Plus 
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15 



20 



25 



30 



35 



40 
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Query: 
Sbjct : 
Query : 
Sbjct: 
Query : 
Sbjct: 
Query: 
Sbjct : 
Query: 
Sbjct: 
Query : 
Sbjct: 
Query: 
Sbjct: 
Query: 

Sbjct: 

Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



340 TCCTGCGTGAGGGACTCAGCCCCTGGGGCCGAAGAGGTGGGGGTGGCTGCCCTGCAGTTG 399 

I I llllllllllllllllll IIIIIIIIIMIIIIIIIIIMIIIIIIIIIIIIIII 

639 TGCAGCGTGAGGGACTCAGCCC -TGGGGCCGAAGAGGTGGGGGTGGCTGCCCTGCAGTTG 697 

400 CCCAGGGCCTATAACCACTACAGCCAGGGCTCAGACCTGGCCCTGCTGCAGCTCGCCCAC 459 

IIMIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIMIIIIIIIIII 

698 CCCAGGGCCTATAACCACTAC AGCCAGGGCTCAGACCTGGCCCTGCTGCAGCTCGCCCAC 757 
460 CCCACGACCCACACACCCCTCTGCCTGCCCCAGCCCGCCCATCGCTTCCCCTTTGGAGCC 519 

IIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIillllllllllllillll 

758 CCCACGACCC ACACACCCCTCTGCCTGCCCCAGCCCGCCCATCGCTTCCCCTTTGGAGCC 817 
520 TCCTGCTGGGCCACTGGCTGGGATCAGGACACCAGTGATGCTCCTGGGACCCTACGCAAT 57 9 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

818 TCCTGCTGGGCCACTGGCTGGGATC AGGACACC AGTGATGCTCCTGGGACCCTACGCAAT 877 
580 CTGCGCCTGCGTCTCATCAGTCGCCCCACATGTAACTGTATCTACAACCAGCTGCACCAG 639 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllilll 

878 CTGCGCCTGCGTCTCATCAGTCGCCCCACATGTAACTGT ATCTAC AACC AGCTGC ACC AG 937 

640 CGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGGTG 699 

MIIIIIIIIMIIIIIIIIIIIIMIIIIIIIIIMIIIIIMIIIIMIIIIIIIIII 

938 CGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGGTG 997 
700 CAGGGCCCCTGTCAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACAC 759 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

998 CAGGGCCCCTGTCAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACAC 1057 



760 



819 



TGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTG 

IIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

10 58 TGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTG 1117 



820 



879 



CTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAGCT 

MIIIIIIIIIIIIIMIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIilMIIMIIII 

1118 CTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAGCT 1177 
880 TTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGT 939 

IIMII IIIIIIIIIIIIMMIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIII 

1178 TTCCTGGCCC AGAGCCCAGAG ACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGT 1237 



940 



999 



GGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCC 

IIMII llllllllllllllllllllllllllllllllllllllllllllllllllllll 

1238 GGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCC 1297 



55 Query: 1000 AGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTG 1059 

I III 1 1 III! II MM II I II 1 1 Mill I MM 1 1 III 1 1 MM II INI 1 1 Mill Ml 

Sbjct: 1298 AGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTG 1357 

Query: 1060 CTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTG 11*19 

60 I II II II II II II II II II II 1 1 1 II II I M II II II II II II II II II II I II I II I II 

Sb j c t : 1358 CTAACTGCTGCCC ACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTG 1417 

Query: 1120 GGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCCAC 1179 

IIMII III I III II MM llllllllllllllllll I II llllllllllllllllll II 

65 Sbjct: 1418 GGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCCAC 1477 

Query: 1180 CCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCC 1239 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
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Sbjct: 1478 CCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCC 1537 



AGCCTGCGGCCCCTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTGGC 1299 

lllllllllllllllllllllllllll llllllllllllllllllllllllllllllll 

AGCCTGCGGCCCCTCTGCCTGCCCTATCCTGACCACCACCTGCCTGATGGGGAGCGTGGC 1597 

TGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTG 1359 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMI 

TGGGTTCTGGGACGGGCCCGCCC AGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTG 1657 

ACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGC 1419 

lllillllllllllllilllllllllllllllllllllllllllllllllllllllllll 

ACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGC 1717 

CCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGCC 1479 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIII I 

CCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGGC 1777 



= 882/1448 (60%), Positives = 882/1448 (60%), Strand = Plus / Plus 
TCACCACCTATGCTATCAACGTGAGCCTGATGTGGCTCAGTTT-CCGGAAGGTCCAAGAA 168 

I III I I nil I II I I I I mill I II I II i I 

TGACCTCATCTGCTTTGCTT-TGGTCTTCAAGCCGCTCAGCGTGCCTGT-GGACAGCGTG 443 
CCCCAGGGCCAACCCAAGCCTCAGGAGGGCAACACAGTCCCTGGCGAGTGGCCCTGGCAG 228 

III II II IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIII 

GCCCCGGCCCC-CCCAAGCCTCAGGAGGGCAACACAGTCCCTGGCGAGTGGCCCTGGCAG 502 
GCCAGTGTGAGGAGGCAAGGAGCCCACATCTGCAGCGGCTCCCTGGTGGCAGACACCTGG 288 

IMIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIMIIII 

GCCAGTGTGAGGAGGCAAGGAGCCCACATCTGCAGCGGCTCCCTGGTGGCAGACACCTGG 562 
GTCCTCACTGCTGCCCACTGCTTTGAAAAGGCAGCAGCAACAGAACTGAATTCCTGCGTG 348 

IMIIIIIIIIIIIIIIIIIIIIIIIIlllllllllllllllllllllllllllll II 

GTCCTCACTGCTGCCCACTGCTTTGAAAAGGCAGCAGCAACAGAACTGAATTCCTG-GTC 621 



Sbjct: 


1478 


Query: 


1240 


Sbjct: 


1538 


Query: 


1300 


Sbjct: 


1598 


Query: 


1360 


Sbjct: 


1658 


Query: 


1420 


Sbjct: 


1718 


Score 


= 948 


Identities : 


Query: 


110 


Sbjct: 


386 


Query : 


169 


Sbjct: 


444 


Query : 


229 


Sbjct: 


503 


Query : 


289 


Sbjct: 


563 


Query : 


349 


Sbjct: 


622 


Query: 


406 


Sbjct: 


676 


Query: 


462 


Sbjct: 


734 


Query: 


521 


Sbjct: 


787 


Query : 


574 


Sbjct: 


847 


Query: 


632 


Sbjct: 


899 


Query: 


687 


Sbjct: 


958 


Query: 


743 



I I I 



I III! I I II III III I mill I I III 



I I I Mil II I 



II 



I II MM I 



-CACCC 461 
II I 



I I II I I III III Mill I nil I I Hill II III 



520 



786 



II II I III I I I III I II I II I 



mill I 



-A 573 



I I I I III nil I I I I I III I II III III II I II 



M III 



I nil I II II II 



I I II I III 



II 



II iiiiiii III I I II I III I II III mm I I I 



II II II I III I I I II I II I I 
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II I II nil 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



Sbjct : 


1016 


Query : 


799 


Sbjct: 


1071 


Query : 


855 


Sbjct : 


1127 


Query : 


908 


Sbjct: 


1185 


Query : 


963 


Sbjct: 


1244 


Query : 


1021 


Sbjct: 


1297 


Query: 


1079 


Sbjct : 


1352 


Query: 


1134 


Sbjct: 


1412 


Query : 


1191 


Sbjct : 


1468 


Query: 


1248 


Sbjct: 


1526 


Query: 


1305 


Sbjct: 


1583 


Query: 


1365 


Sbjct: 


1639 


Query : 


1423 


Sbjct : 


1698 


Query: 


1478 


Sbjct: 


1756 


Query: 


1537 


Sbjct: 


1811 


Score 


= 894 


Identities = 


Query: 


1 


Sbjct: 


171 


Query: 


61 


Sbjct: 


231 



1016 — GATTCCGGGGGCC -CTGTGCTGTGCCTCGAGCCTGA- CGGACACTGGGTTCAGGCTG- 1070 



GCC-CAGGAGGAC-GCTCCTGTGCTGCTGACCAACACAGCTGCTCACAGTTC--CTGGCT 

II II II II I nil II III I I nil I II I III I 



-CA- 
II 



854 



907 



-GCTCG-AGTTCAGGGG-GCAGCTTTCCTGGCCCAGAGCCCAGAGACCCCGG 

nil inn ii iiii i i i ni i in n ii 

rGCTCACAGTTCCTGGCTGCAGGCT — CGAGTTCAGGGGGCAGCTTTCCTGG 1184 



II 



III II I I I I I 



II II 



II II III I III 



III I IIII llllll III 



I III II II II II II 



IIII II I III II IIII 



I inn II III n i i 

-GTGGCGGAGCC — CTGGTGTCAGAGGAG 1351 



TCATTGGGCGCCAG-GCCC-CAGAGGAATGGAGCGT-AGGGCTG-G-GGACCAGACCGGA 1133 

I II I IIII II II I III III I I III II II I 

GCGGTGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTA 1411 



II lllll II 11 II 



I II II II 



IIII I 



III 



1 1 



II I I I 



FACACCCACCCTG-AGGGGGG 1190 

I I II nil I II I 

-AG-CTCATCCTGCATGGAGC 1467 



III III llllll I 



1304 



I I III III 

ACACTGGGAGCCAGCC - 



III I II I llllll II II I I II I 

-TGCGGCCCCTCTGCCTGCCCTATCCTGACCACCACCTGCCT 1582 



TCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTGACCCT 1364 

IIII I II II II I III IIII IIII II IIII 

GATGGG- -GAGCGTGGCTGGGTTCTGGGACGGGCCCGC-CCAGG-AGCAGGCATCAGCTC 1638 



III I I II III I I 



II 



I II IIII II III I 



ATTCTGCCGGGGATGGTGTGTACCAGT- -GCTGTGGGTGAGCTGC-CCAG — CTGTGAGG 1477 

I III II lllll I I I I II I II II II IIII lllll 11 

CTCCTGGGGGTGATGGCA-GCCCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTG-GG 1755 

CCAACCAACCAGCTGCTGACAGGGGACCTGGC -CATTCTCAGGAACAAGAGAATGCAGGC 1536 

I I lllllll III I IIII II I I II II III II III 

TGAGCTGCCCAGCTG-TGAGGGCCTGTCTGGGGCAC-CACTGGTGCATGAGG-TG-AGG- 1810 



IIII IIII II III II 

-GGCACATGG — TTCCTGGCC 1828 



CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 6 0 

IIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIilllllli 

CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 230 

CTGGCCTGGATCCTGTTCTTCGTGCTCTATG ATTTCTGCATTGTTTGTATCACCACCTAT 120 

llllllll II III IIII llllllllllll lllll llllllllllllllllllllllll II 

CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 290 
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Query: 


121 


5 


Sbjct : 


291 




Query: 


181 




Sbjct: 


351 


10 


Score 


= 699 




Identities 




Query: 


990 


15 


Sbjct : 


1508 




Query : 


1045 


20 


Sbjct: 


1564 




Query: 


1102 




Sbjct: 


1623 


25 


Query: 


1160 




Sbjct: 


1679 


30 


Query: 


. 1220 


Sbjct: 


1734 




Query: 


1276 


35 


Sbjct: 


1791 




Query: 


1334 


40 


Sbjct: 


1849 




Query: 


1391 




Sbjct: 


1907 


45 


Query: 


1450 




Sbjct: 


1961 


50 


Query: 


1506 




Sbjct: 


2021 




Query : 


1566 


55 


Sbjct: 


2080 



GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGAAGGTCCAAGAACCCCAGGGCCAA 180 

lllllllllllllllllllllllllllllllllllllllllllllllllllllllll I 

GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGAAGGTCCAAGAACCCCAGGGCAAG 350 



II 



9.8e-eo, Sum P(2) = 9.8e-60 (seq id noiIO?) 
;^es = 391/603 (64%), Strand = Plus / Plus 



II 



I INI III II II I Mil II HIM III I 



II 



TCA-GAGGAGGCGGTGC-TAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCC-CAGAG 

II III Mil I II I I III II I II Mill I 11 



1101 



GAATGGAGCGTAGGGCTGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCAT — CC 1159 

II I III I II Mini I Ml Ml I II I II 

GAGCAG- GCATCAG- CTCCCT -CCAGACAGTGCCCGTGAC -CCTCCTGGGGCCTAGGGCC 1678 
TGCATGGAGCCTACACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCC 1219 

MM I I M II II I I HIM I I II Mill I MM II 

TGCA-GCCGGCTGCATGCAGC-TCCTGGGGGTGATGGCA — GCCCTATT-CTGCCGGGGA 1733 



I MM II I I I I I II MM III I III I III II 



II III III II II 



III II HIM II III 



T-CA-GCTCCCTCCA-GACAGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGC 1390 

I I II II I III III I I I II I II MM I I I 



I I MM I II 



I I III III III MM III II III 

'G--G-CAGGTCTACTTC-GCCGAGGAACCAGAGCCCGAG- 



1960 



INI I II I Mini M 



I IIIIIIIIIIIIIIMIIIIIMIIIIII 



MMMIIIIIIIII llllllllllllllllllllllllllllllllllllllllllll 



IIIIIIIIIIIIIIMIIIIIIIIII 



60 



>patn:A37 664 Human peptidase, HPEP-8 coding sequence - Homo sapiens, 1661 bp. 

(SEQ ID NO: 64) 

Length = 1661 

Plus Strand HSPs: 



65 



Score = 3831 (574.8 bits), Expect = 5.6e-168, P = 5.6e-168 

Identities = 767/768 (99%), Positives = 767/768 (99%), Strand = Plus / Plus 
Query : 712 CAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACACTGGGTTCAGGCT 771 

llllllllilllllllll Mill III MM I llllll Mill llllllll Mill Mill 

Sbjct: 320 CAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACACTGGGTTCAGGCT 379 

49 





Query : 


772 


5 


Sbjct ; 


380 




Query : 


832 




Sbjct: 


440 


10 


Query: 


892 




Sbjct: 


500 


15 


Query: 


952 




Sbjct: 


560 




Query: 


1012 


20 


Sbjct: 


620 




Query : 


1072 


25 


Sbjct: 


680 




Query: 


1132 




Sbjct: 


740 


30 


Query: 


1192 




Sbjct: 


800 


35 


Query: 


1252 




Sbjct: 


860 




Query: 


1312 


40 


Sbjct: 


920 




Query: 


1372 


45 


Sbjct: 


980 




Query: 


1432 




Sbjct: 


1040 


50 


Score 


= 974 




Identities 




Query : 


546 


55 


Sbjct: 


1 




Query : 


606 


60 


Sbjct: 


61 




Query: 


666 




Sbjct: 


121 


65 


Query: 


725 




Sbjct: 


181 



GGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTGCTGCTGACCAAC 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTGCTGCTGACCAAC 



831 



439 



891 



ACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAGCTTTCCTGGCCCAG 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

ACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAGCTTTCCTGGCCCAG 499 

AGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGTGGATCCTTGAGG 951 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIII 

AGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGTGGATCCTTGAGG 559 

ACAGCAGGTCCCC AGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCCAGGCTGATGCAC 1011 

llllllllllllllllllllllllilllilllllllllllllllllllllllllllllll 

ACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCCAGGCTGATGCAC 619 

CAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCC 1071 

llllllllllllllllllllllllilllilllllllllllllllllllllllllllllll 

CAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCCGTCCTAACTGCTGCC 679 

CACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTGGGGACC AGACCG 1131 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGCTGGGGACCAGACCG 739 

GAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTAC ACCCACCCTGAGGGGGGC 1191 

IIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIM 

GAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCCACCCTGAGGGGGGC 799 

TACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCGGCCC 1251 

IIIIIMIIIIIIIIIIIIIIIIIMIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIII 

TACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCGGCCC 859 

CTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGA 1311 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllli 

CTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGA 919 

CGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTGACCCTCCTGGGG 1371 

lllllllllll.llllllllllllillllllllllllllllllllllllllllllllllll 
CGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCGTGACCCTCCTGGGG 979 

CCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGCCCTATTCTGCCG 1431 

lllllillllllllllllllllllllllllllllllillillllllllllllllllllll 

CCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGCCCTATTCTGCCG 1039 



IIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIII I 



ts), Expect = 6.1e-39, P = 6.1e-39 (SEQ ZD MO:108) 
(63%), Positives = 632/998 (63%), Strand = Plus / 



Plus 



GGACACCAGTGATGCTCCTGGGACCCTACGCAATCTGCGCCTGCGTCTCATCAGTCGCCC 605 

llllllillllllllllllllllllllllllllllllllllllllllllllllllllill 

GG ACACC AGTGATGCTCCTGGGACCCTACGCAATCTGCGCCTGCGTCTCATCAGTCGCCC 6 0 

CACATGT AACTGTATCTACAACC AGCTGCACCAGCGAC ACCTGTCCAACCCGGCCCGGCC 665 

lllllllllllllllllllillllllllllllllllllllllllllllllllllllllil 

CACATGTAACTGTATCTACAACCAGCTGCACCAGCGAC ACCTGTCCAACCCGGCCCGGCC 120 



llllllllllllllllllllllllllllllllllllllllllllllllll III i 

TGGGATGCTATGTGGGGGCCCCCAGCCTGGGGTGCAGGGCCCCTGTCAGGTCTGATAGGG 180 



I I 



I II I 



III I I II lllll I II I I I III II 



50 



.3, Ol I'll '/ H"^, O H'ld' rJ 111 ic!' 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



Query : 


782 


Sbjct : 


238 


Query : 


837 


Sbjct: 


295 


Query: 


890 


Sbjct: 


352 


Query : 


947 


Sbjct : 


404 


Query: 


1003 


Sbjct: 


461 


Query: 


1062 


Sbjct : 


520 


Query : 


1120 


Sbjct: 


579 


Query: 


1178 


Sbjct: 


637 


Query : 


1235 


Sbjct: 


694 


Query : 


1292 


Sbjct : 


751 


Query : 


1350 


Sbjct : 


806 


Query: 


1404 


Sbj ct : 


863 


Query: 


1460 


Sbjct: 


922 


Query: 


1519 


Sbjct: 


979 


Score 


= 706 


Identities = 


Query : 


990 


Sbjct: 


818 


Query : 


1047 


Sbjct: 


874 



G--CTTTGCATCA-AGCTGTGCCCAGGAGGACGCTCCTGTGCT-GCTGACCA-ACACAGC 836 

I II I I Hill III III I II II I MM I I II I 



II II I I I I Ml Mill 



-G-AG-TTCAGGGGGCAGCTTTCCTG-GCCC 

I II III MUM II I III III 



Mill I III I II I I M I II I II I 



III 



889 



403 



AGC AGGTC - C - CCAG - GO AGGAGCACCCTCCCCATGGCCCTGGGAGG - CCAGG 1002 

III III I I M I II I I III I I I I II II II II 

:CAGGAGGACGCTCCTGTGCTGCTG-ACCAACAC-A-GCTGCTCACAGTTCCTGG 460 



III I II I II III I 



I I INI I 



MM 



III I 



I II II II III II I I M Mill I II 



-CGTAGGGCTG 1119 
I III I 



I I III II I 



III 



I I 



I I I II II 



II III II I II Ml II I 

;AG — GCTGATGCACCAGGGACAGCTGGCCTG 63 6 



I llllll MM III II III 



HIM I I I 



I II II I III MM II II Mill 



II I 



I MM I 



II I I II II II 



I I I II 



II I III 



II MM 



M I I Ml MM II III II 



1 1 



862 



I I II I I 



Mill 



II I II I I I I MM I 



II llllll I MM I II I Mi mil I I II II I II I 



I I II I Mill I III 



390/603 (64%), 



tpect = 1.9e-23, P = 1.9e-23 (seq id hoiIOS) 
Positives = 390/603 (64%), Strand = Plus / Plus 



III I MM III II II I MM 



AGCTGGCCTGTGGCGGAGC — CCTGGTGTC 1046 

II HIM III I MM I 

-GCCAGCCTGCGGCCCCTCTGCCTGCCCTA 873 



I II 



II III II I I II II II 



I I Mill I III 



Query: 1103 AATGGAGCGTAGGGCTGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCAT — CCT 1160 

51 



I'll 11,1 ;/■ « !1"'S ifr fa S.fi e ir;l! r* if ii rl' 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



Sbjct : 


934 


Query: 


1151 


Sbjct: 


990 


Query : 


1221 


Sbjct: 


1045 


Query : 


1277 


Sbjct: 


1102 


Query: 


1335 


Sbjct: 


1160 


Query: 


1392 


Sbjct: 


1218 


Query: 


1451 


Sbjct: 


1272 


Query : 


1507 


Sbjct: 


1332 


Query: 


1567 


Sbjct: 


1391 


Score 


= 481 


Identities = 


Query: 


207 


Sbjct: 


584 


Query: 


267 


Sbjct: 


643 


Query: 


326 


Sbjct: 


702 


Query : 


386 


Sbjct: 


755 


Query : 


443 


Sbjct: 


813 


Query: 


491 


Sbjct: 


873 


Query: 


550 


Sbjct: 


931 


Query: 


610 



II 



II I I I 



I M II II I 



Mini 



III I II 



I II 



II 



I MM II 



1221 GCCTGTG - ACACTGGGA-GCCAGCCTGCGGCCCCTCTGCCTGC -CCTATGCTGAC - CACC 1276 

I Ml! II II I I I II MM III I ill I Ml MM 

GG-TGTGTAC-CAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGGCCTGT-CTGGGGCACC 1101 



AGO — TGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCAT 1334 

M III III II III I I Mill II Mill II III I I II 

ACTGGTGCATGA-GGTGAGGGGCACATGGTTCCTGGCCGGGCT -GCACAGCTTCGGAGAT 1159 



III III III 

GCTTGCCAAGGCCCCGCCAG 



[•GCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCT 1391 

III I I I II I II MM I I II 

-GCCGGCGGTCTTCACCGCGCTCCCTGCCTAT-GAGGACT 12 17 



GCATGCAGCTCCTGGGCGTGATGGCAGCCCTA-TTCTGCCGGGGATGGTGTGTACCAGTG 1450 

I I MM I II II I ill III III MM Ml I I I II I 

GGGT-CAGCAGTTTGGACTG — G-CAGGTCTACTTC-GCCGAGGAACCAGAGCCCGAG-G 1271 

CTGTGGGTG- A-GCTGCCCAGCTGTGAG- -GCCAACCAACCAGCTGCTGACAGGGGACCT 1506 

Ml I II I iiiiii II I iiiiiiiiiiiiiiiiiiiiiiiiiiiiii 

CTGAGCCTGGAAGCTGCCTGGCCAACATAAGCCAACCAACCAGCTGCTGACAGGGGACCT 1331 

GGCCATTCTCAGGAACAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTCC 1566 

IIIIII II II MM illlMlililillllllJIIIIIIIiiillliilllilllllll 

GGCCATTCTC AGGA- CAAGAGAATGCAGGCAGGC AAATGGCATTACTGCCCCTGTCCTCC 1390 



MMMMIMMMMMIMMM 

CCACCCTGTCATGTGTGATTCCAGGC 1416 



Positives 



12, P = l,le-12 (SEQ ID NO: 110) 
409/666 (61%), Strand = Plus / Plus 



CCCTGGCGAGTGGCCCTGGCAGGCCAGTGTGAGGAGGCAAGGAGCCCACATCTGCAGCGG 266 

MM I i lllllllll lllllil III I II Ml I III MM 

CCCTCCCCA-TGGCCCTGGGAGGCCAGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGG 642 

CTCCCTGGTGGCAGACACCTGGGTCCTCACTGCTGCCCACTGCTTTGAAAAGGCAGCAG - 325 

llllllll MM III II IIIIMIIIIIIIIIII I III III 

AGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCCCACTGCTTC - ATTGGGCGCCAGG 701 

CAACAGAACTGAATTCCTGCGTGAGGGACTCAGCCCCTGGGGCCGAAGAGGTGGGGGTGG 385 

I nil Mil INI III! II I III ,11111111111 I 

CCCCAGAG — GAATGGA-GCGT-AGGG-CTGGGGACCAGAC-CGGAGGAG-TGGGGCCTG 754 

CTGCC-CTGCAGT-TGCCCAGGGCCTATAACCACTAC-AGCCAGGGCTCAGACCTGGCCC 442 

II II II 111 I Mill 1 MM II Mill III IIIIII 



I MM III Mill II 



-CAC — G-ACCCA-CA — CA-CCCCTCTGCCTGCCCC 490 
111 I I III I 1 IIIMIIIIIIIIII 



I I I III II III III I III Mill Mil M I 



I II III II MM I 



II I I III I I I I II II II I 

-CAGACAGTGCCCGTGACCC-TCC-TGGGGCCT-A 



52 



983 



10 



15 



Sbjct : 


984 


Query : 


669 


Sbjct : 


1042 


Query: 


728 


Sbjct: 


1098 


Query : 


783 


Sbjct: 


1156 


Query : 


835 


Sbjct: 


1215 



I III I I II II II III I nil III I III M 

GGGC-CTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGC-AGCCCTATTCTGCCGGG 1041 



Mil I III 



I I 



I MM I II I I III! III! II nil 

-G--TGGGTGAGCTGCCCAGCTGTGAGGGCCTGTCTGGGG 1097 



G-CC-CTG-TGC-TGTGCCTCGAGCCTGACGGACACTGGGTTCAGGCTGG-CATCATCAG 782 

II III III III I I I III nil I Mill II III! 

CACCACTGGTGCATGAGGTGAGGGGC ACATGGTTCCTGGC — CGGGCTGCACAGCTTCGG 1155 



III II MM Ml III I II II I II Ml I III I I 



1214 



III III inn 



III III nil II I III 

-TGGACTGGCAGG-TCTACTTC 1249 



20 



25 



Figure 10. BLASTP identity search for the protein of the invention. 

>patp:Y41704 Human PR0351 protein sequence - Homo sapiens, 571 aa. (seq id 

NO: 65) 

Length =571 
Plus Strand HSPs : 



30 



35 



40 



45 



50 



55 



60 



65 



Score = 2544 (895.5 bits), Expect = l.le-263, P = l.le-263 
Identities = 476/493 (96%), Positives = 479/493 (97%), Frame = +1 



Query: 


19 


Sbjct: 


1 


Query: 


199 


Sbjct: 


60 


Query: 


358 


Sbjct: 


120 


Query : 


535 


Sbjct: 


180 


Query : 


715 


Sbjct: 


240 


Query : 


895 


Sbjct: 


300 


Query: 


1075 


Sbjct: 


360 


Query: 


1255 


Sbjct: 


420 


Query : 


1435 


Sbjct: 


480 



MIIIIIMI 



IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIK I + I 



NTVPGEWPWQASVRRQGAHICSGSLVADTWVLTAAHCFEKAAATELNS — CVRDS 

llllllllllllllllllllllllllllllllllllllllllllllll I I 
NTVPGEWPWQASVRRQGAHICSGSLVADTWVLTAAHCFEKAAATELNSWSWLGSLQREG 



HIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIM 
LSPGAEEVGVAALQLPRAYNHYSQGSDLALLQLAHPTTHTPLCLPQPAHRFPFGASCWAT 



357 



119 



53^ 



179 



GWDQDTSDAPGTLRNLRLRLISRPTCNCIYNQLHQRHLSNPARPGMLCGGPQPGVQGPCQ 714 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GWDQDTSDAPGTLRNLRLRLISRPTCNCIYNQLHQRHLSNPARPGMLCGGPQPGVQGPCQ 239 

GDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTNTAAHSSWLQARVQGAAFLAQS 894 

IIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 
GDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTNTAAHSSWLQARVQGAAFLAQS 299 

PETPEMSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAAH 1074 

IIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIII 

PETPEMSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAAH 359 



IIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIII 



llllllllllllllll 



CLPYADHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPG 1434 

INI IIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIMIIIIIIIIIIIIIIII 

CLPYPDHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPG 479 

MVCTSAVGELPSCE 147 6 

IIIIIIIMIIIII 
MVCTSAVGELPSCE 493 



53 



S\ O r. 11 7' H' S frb^ f-'i 



Score = 324 (114.1 bits). Expect = 7.0e-26, P = 7.0e-26 (SEQ id z«>sill) 
Identities = 91/250 (36%), Positives = 123/250 (49%), Frame = +1 



10 



Query: 187 PQEGNTVPGEWPWQASVRRQGAHICSGSLVADTWVLTAAHCFEKAAATELNSCVRDSAPG 366 

II I I llhl + II I hll++ IIIIIIM III +1 
Sbjct: 322 PQAG — APSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGRQAPEEWSVGLGTRP- 378 

Query: 367 AEEVGVAALQLPRAYNHYSQGSDLALLQLAHPTTH TPLCLPQPAHRFPFGASCWAT 534 

II 1+ I I II I I hill II I I Mill MM I 

Sbjct: 379 -EEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYPDHHLPDGERGVyfVL 437 



15 



20 



25 



Query 
Sbjct 
Query 
Sbjct 
Query 
Sbjct 



535 GWDQDTSDAPGTLRNLRLRLISRPTCNCIYNQLHQRHLSN— PARPGMLCGGPQPGVQGP 708 

I + + +1+ + + K 1+ +11 +1 IIKI I 

438 GRARPGAGI-SSLQTVPVTLLGPRACS RLHAAPGGDGSPILPGMVCTSAV-GELPS 491 

709 CQGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTNTAAHSSWLQARVQGAAFLA 888 

Kl III I I M IK II +1 Ml 1+1+ + -^ * I 

492 CEGLSGAP-LVHEVRGTWFLAGLHSFGDACQGPARPAVFTALPAYEDWVSS-LDWQVYFA 549 

889 QSPETPEMSDEDSCVA 936 

+ II IM+ IM 

550 EEPE-PE-AEPGSCLA 563 



>patp:Y90291 Human peptidase, HPEP-8 protein sequence - Homo sapiens, 267 aa. 

(SEQ ID NO:66) 



30 



Length = 267 
Plus Strand HSPs : 



35 



Score = 1028 (361.9 bits). Expect = 5.0e-103, P = 5.0e-103 
Identities = 189/189 (100%), Positives ^= 189/189 (100%), Frame = +1 

MSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGR 1089 

IIIIIIIIMMIIMIIIIIIIIIIIIIIIIIIIIIIIillllimilllllllllll 

MSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGR 60 

QAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYA 1269 

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiimiiiiiiiiiii 

QAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPOT tGASLRPLCLPYA 120 

DHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPGMVCTS 1449 

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiimiiiiiii 

DHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPGMVCTS 

1450 AVGELPSCE 1476 
lllllllll 





Query: 


910 




Sbjct: 


1 


40 


Query: 


1090 




Sbjct: 


61 


45 


Query: 


1270 




Sbjct: 


121 




Query : 


1450 


50 


Sbjct: 


181 




Score 


= 316 




Identities 


55 


Query : 


187 




Sbjct: 


18 


60 


Query : 


367 




Sbjct: 


75 




Query: 


535 


65 


Sbjct: 


134 




Query: 


709 



180 



90/250 (36%), 



Sxpect = 4.2e-27, P = 4.2e-27 (SEQ ID NOsll2> 
Positives = 122/250 (48%), Frame = +1 



II I I lll+l + II I I+II++ lllllll 



I I I 



II k I I II I I hill II I I 



+ 1 + 



1 + 



+ 11 



-TPLCLPQPAHRFPFGASCWAT 534 
mil III I 



-PARPGMLCGGPQPGVQGP 708 
I llkl I 



l-l II I I 



11+ II +1 



54 



I + I 1+ I- 



81 iOl T ^'It-S fiii H w^f, nrl'O ir"' 



Sb j C t : 188 CEGLSGAP- LVHEVRGTWFLAGLHSFGDACQGPARPAVFTALPAYEDWVSS-LDWQVYFA 245 

Query: 889 QSPETPEMSDEDSCVA 936 

+1111+^ IM 
5 SbjCt: 246 EEPE-PE-AEPGSCLA 259 

Table 11. BLASTN identity search (versus the human SeqCalling database for the 
Peptidase-iike protein of the invention. 

>s3aq: 153687026 Category D: 377 frag (6 5'sig-CG, 204 non-5 • sig-CG, 167 non-CG (seq id 
10 NO:67) 

EST) , 1114 bp. 
Length = 1114 



15 



35 



Minus Strand HSPs: 

Score = 894 (134.1 bits). Expect = 3.1e-35, P = 3.1e-35 

Identities = 182/186 (97%), Positives = 182/186 (97%), Strand = Minus / Plus 



Query: 


186 


Sbjct : 


413 


Query: 


126 


Sbjct: 


473 


Query : 


66 


Sbjct: 


533 


Query: 


6 


Sbjct: 


593 



CTTGGGTTGGCCCTGGGGTTCTTGGACCTTCCGGAAACTGAGCCACATCAGGCTCACGTT 127 

20 ~ ~ Ml I I II II IMIIII II II mil I III I Mill II I Mill II I mill II II 

CTTAGCCTTGCCCTGGGGTTCTTGGACCTTCCGGAAACTGAGCCACATCAGGCTCACGTT 472 
GATAGCATAGGTGGTGATACAAACAATGCAGAAATCATAGAGCACGAAGAACAGGATCCA 6 7 

MIIIIIMMII IIIIIIIIIIMIIIIIIIIIIII IIIIIIIIIIIMIIIIMIIII 

25 Sbjct: 473 GATAGCATAGGTGGTGATACAAACAATGCAGAAATCATAGAGCACGAAGAACAGGATCCA 532 
GGCCAGGTAGACAGAACCAGCGAGAGACACCAGGGAGCTCAGCAGCATCAGGACAGAGGC 7 

IMIIIIIIIIII IMIIII lllimilMlilllllllMIIIIMIIIIIimMI 

^ GGCCAGGTAGACAGAACCAGCGAGAGACACCAGGGAGCTCAGCAGCATCAGGACAGAGGC 592 

30 

CCAGCG 

IMIII 

593 CCAGCG 598 

>s3aq:152507187 17 frag (1 5'sig-CG, 7 non-5 • sig-CG, 9 non-CG EST), 588 bp. (SEQ ID 
NO:68) 

Length 588 
40 Plus Strand HSPs : 

Score = 882 (132.3 bits). Expect = 2.1e-34, P = 2.1e-34 

Identities = 178/180 (98%), Positives = 178/180 (98%), Strand = Plus / Plus 
45 Query: 1 CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 60 

iiiiiiiiiiiiiiMiiiiiiimiiiMiiiiiiiiiiiiiiiiii iiiiiiiiii 

CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGTTTCTGTCTAC 426 

CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 120 

50 I II M I M I II 1 1 I II 1 1 1 1 I II 1 1 II II 11 III I III III II I II II III I III II I II 

CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 486 

GCTATC AACGTGAGCCTGATGTGGCTCAGTTTCCGGAAGGTCCAAGAACCCCAGGGCC AA 180 

llllllllllllllllllllllllllllllllllllllllllllllllllllllll III 
55 Sbjct: 487 GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGAAGGTCCAAGAACCCCAGGGGCAA 546 



Query: 


1 


Sbjct: 


367 


Query: 


61 


Sbjct: 


427 


Query: 


121 


Sbjct: 


487 



>s3aq: 153485867 Category D: 3 frag (1 non-5 ' sig-CG, 2 non-CG EST), 612 bp. (SEQ ID 
NO: 69) 

60 Length = 612 

Plus Strand HSPs: 
Score = 785 (117.8 bits), Expect = 1.7e-29, P = 1.7e-29 



55 



"ii. ill CP y S B B iHi ii-'!: ir': ii 



Identities = 157/157 (100%), Positives = 157/157 (100%), Strand = Plus / Plus 



CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 60 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIII 

CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 515 

CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 120 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

10 Sbjct: 516 CTGGCCTGGATCCTGTTCTTCGTGCTCTATGATTTCTGCATTGTTTGTATCACCACCTAT 575 



15 



Query: 


1 


Sbjct : 


456 


Query: 


61 


Sbjct: 


516 


Query : 


121 


Sbjct: 


576 



GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGA 157 

lllllllllllllllllllllllllllllllllllll 

GCTATCAACGTGAGCCTGATGTGGCTCAGTTTCCGGA 612 



>s3aq: 153485864 Category D: 2 frag (2 non-5 ' sig-CG) , 425 bp. (SEQ id nO:70) 
Length = 425 



20 Plus Strand HSPs: 

Score = 785 (117.8 bits). Expect = 2.4e-29, P = 2.4e-29 

Identities = 157/157 (100%), Positives = 157/157 (100%), Strand = Plus / Plus 

25 Query: 1 CGCTGGGCCTCTGTCCTGATGCTGCTGAGCTCCCTGGTGTCTCTCGCTGGTTCTGTCTAC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



30 I M 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 



35 



Query: 


1 


Sbjct: 


269 


Query: 


61 


Sbjct: 


329 


Query : 


121 


Sbjct: 


389 



llllllllllllillllllllllllllllllllll 



56 



1 i'l iP y H'B 'ill Q ii? S Vi n; 



Table 12. ClustalW alignment of the protein of the invention. 



CG50817-05 


MLL3 


3 L V 


3 lag; 


SVYLA^ 


VI LF 


F^ 


? LYDFCI V 


C ITT I'A I l-I V 


slmvvl: 


3FRK^, 


AC2EPQGS 


r41704 


M LLS 


3 L V 


s lag; 


SVYLA^; 


VI LF 


F \ 


J LYDFCI V 


C I T T YA I N V 


sliviwl: 


3FRK\ 


^qepqgJ 



VQn9Q1 



3PKPQE9 
C-AKRHl 



CG50817-03 


IMT VPi: 


3 EV'/?V>/Q/ 


^.SVRRQC 


} AH ICS 


gslvadt V.,^^ 


L T A 


AHC 


;feka. 


A ATE 


LH 


y41704 


|nt vpt: 


■j EWPVVQ/ 


U5VRRQC 


} AH ICS 


GSLVADTW^^ 


V L T A 


AHC 


;feka. 


A ATE 


ln5 



3WSVVLGB 



Y90291 



CG50817-05 SA 
V41704 L S 
Y90291 



PGA E 




/GVAALQLP RAYNHYSQC 


JSDLA L LQ 


L, 


AHPTTHTP 


LCLPQ 


PAHRFPFGASC^ 


/VAT 


PGA E 




/ G V A A LO L. P R A YN H Y 3 Q C 


>SDLA L LQ 


L. 


AHPTTHTP 


L c: L P Q 


PAHRFPFGASC^ 


A T 



CG50817-05 
y41704 1 


G VvT)Q DTSDAPGT LRN L R L RL I 
G VVTDO DTSDAPGT LRH L R L RL I 


3RPTCMC I Y NQ 
SRPTCKC I Y HO 


LH 
LH' 


Q RH L S M P A R P GM LC G GP Q P G 
ORH L SK P A RP GM LC G GPO P G 


VQGP CO 
■VOG P CC' 


Y90291 












CG508 17-03 
Y41704 1 


G D 3 G G P V L C L E P DGH W V 0 AG I 

G D 3 G G P V L C L E P DGH W V Q AG I 


I 3FA3SCAQ ED 
I SFAS3CA Q ED 


AP 
AP 


V L LT NT A A H S S Vv'LO A RVOGA 

V L L T N T A A H 3 3 V^^LQ A RVQ G A 


AF LA OS 
AFLA OS 



Y90291 

CG50817-05 

Y41704 

Y90291 

CG50817-03 
Y41704 

Y90291 



P E T P E M 3 D E D 3 C V AC G 3 L RT A G PQAGAP S P WP WE ARLMHQ G Q LAC G GA L VS E EA V LT A A H 
P E T P E M SDEDSCVACGSLRTAGPQ AG A P S P WP V^^E AR LMHQ G Q L A C G G A L V3 E E A V L T A A H 
\A 3 D E D 3 C V A C G 3 L R T A G P Q AG A P 3 P WP WE AR L MKQ} G 0 L A C G G A L V3 E E A V L T A A H 



C F IGRQ AP EEWSVGLG T RPEEWGLKQLI L H G A Y THPEGGYDMALL L LAQ PVT LGA3 L R P L 
C F IGROAP EEV./3VGLGTRPEEVVGLKQLI L H G A Y THP E GG YD M AL L L LA O PVT LGA3 L R PL 
C F I G R O A P E E W3 V GL G T RP E E WGLKOL I L H G A Y TH P E GG YD M AL L L LA O P VT LGA3 L R P L 



CG50817-03 


C L 


P 


YAD 


H 


H 


L 


P 


D 


GERGVi' 


V L 


GRAR 


P GAG I 3 ■;. 


D L 


0 T V 


P 


VTLLGP 


R 


ACS 


RL 


HA 


AF 


GGDGi: 


3P I 


L 


PG 


Y41704 


C L 


P 


yHd 


H 


H 


L 


P 


D 


GERGW 


VL 


GRAR 


PGAGI3J 


3 L 


OTV 


P 


VTLLGP 


R 


ACS 


RL 


HA 


AP 


GGDG^ 


3PI 


L 


PG 


Y90291 


C L 


P 


YAD 


H 


H 


L 


P 


D 


GERGW 


VL 


GRAR 


PGAG I 3 i 


L 


QTV 


P 


VTLLGP 


R 


ACS 


RL 


HA 


AP 


GGCiG^ 


?PI 


L 






CG508I7-05 E NAGRia^ALL P LSS 
Y41704 
Y90291 



Information for the ClustalW proteins: 



Accno 

CG50817-05 (SEQ id N0:45) 
Y41704 (SEQ ID NO:122) 
Y90291(SEQ ID NO:123) 



Common Name Length 

novel Peptidase-like protein 

Human PR035 1 protein sequence. 571 

Human peptidase, HPEP-8 protein sequence. 267 



5 In the alignment shown above, black outlined amino acid residues indicate regions of 

conserved sequence (i.e., regions that may be required to preserve structural or functional 
properties); greyed amino acid residues can be mutated to a residue with comparable steric 
and/or chemical properties without altering protein structure or function (e.g. L to V, I, or M); 
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non-highlighted amino acid residues can potentially be mutated to a much broader extent without 
altering structure or function. 



Table 13. Psort, Signal P and hydropathy results for CG50817-05 



10 



plasma membrane Certainty=0 . 6850 (Affirmative) < suco 

endoplasmic reticulum (membrane) certainty=0 . 6400 (Affirmative) < suco 

Golgi body Certainty=0 . 3700 (Af f ianmative) < suco 

microbody (peroxisome) Certainty=0 . 1187 (Affirmative) < suco 



INTEGRAL Likelihood = -8.44 Transmembrane 15 - 31 (1 - 38) 

Seems to be a Type II (Ncyt Cexo) membrane protein 
Is the sequence a signal peptide? 
# Measure Position Value Cutoff Conclusion 
max. C 36 0.688 0.37 YES 

max. Y 36 0.555 0.34 YES 

- max. S 10 0.991 0.88 YES 

mean S 1-35 0.875 0.48 YES 

20 # Most likely cleavage site between pos . 35 and 36: TYA-IN 



15 



^ 0.5 - 




£00 300 

Amino Acid Number 



SECP13 



25 
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A SECP13 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:46) and encoded polypeptide sequence (SEQ ID NO:47) of clone 
CG508 17-06 directed toward novel peptidase (HPEP-8)-like proteins and nucleic acids encoding 
them. This is a related variant of SECPl 1 and SECP12, clones CG50817-04 and CG50817-05. 
5 Figure 18 illustrates the nucleic acid sequence and amino acid sequences respectively. This 
clone includes a nucleotide sequence (SEQ ID NO:46) of 1200 bp. The nucleotide sequence 
includes an open reading frame (ORF) beginning with an ATG initiation codon at nucleotides 
33-35 and ending with a TGA codon at nucleotides 945-947. Putative untranslated regions, if 
any, are found upstream from the initiation codon and downstream from the termination codon. 
10 The encoded protein having 304 amino acid residues is presented using the one-letter code in 
Figure 18. 

The protein encoded by clone CG508 17-06 is predicted by the PSORT program to the 
cytoplasm with a certainty of 0.4500, and does not appear to be a signal protein (see Table 18 
below). 

15 The DNA sequence and protein sequence for a novel Peptidase-like gene or one of its 

splice forms thus derived is reported here as the invention CG508 17-06. The Genomic clones 
having regions with 100% identity to the extended sequence thus obtained were identified by 
BLASTN searches with the extended sequence against human genomic databases. The genomic 
clone was selected for further analysis because this identity indicates that these clones contain 

20 the genomic locus for these SeqCalling assemblies. 

The regions defined by all approaches were then manually integrated and manually 
corrected for apparent inconsistencies that may have arisen, for example, from miscalled bases in 
the original fragments used, or from discrepancies between predicted homolgy to a protein of 
similarity to derive the final sequence of the invention CG508 17-06 reported here. When 
25 necessary, the process to identify and analyze SeqCalling assemblies, ESTs and genomic clones 
was reiterated to derive the full length sequence. 

Similarities 

In a search of sequence databases, it was found, for example, that the nucleic acid 
sequence of this invention has 840 of 842 bases (99%) identical to a gb:z34002 Hiunan pro351 
30 nucleotide sequence from Homo sapiens (Tables 14 and 16). The full amino acid sequence of 
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the protein of the invention was found to have 278 of 279 amino acid residues (99%) identical to, 
and 278 of 279 amino acid residues (99%) similar to, the 571 amino acid residue Y41704 Human 
PR0351 protein from Homo sapiens (Table 15). 

A multiple sequence alignment is given in Table 17, with the protein of the invention 
5 being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
related protein sequences. 

The presence of identifiable domains in the protein disclosed herein was determined by 
searches using algorithms such as PROSITE, Blocks, Pfam, ProDomain, Prints and then 
determining the Interpro number by crossing the domain match (or numbers) using the Interpro 
10 website. The results indicate that this protein contains the following protein domains (as defined 
by Interpro) at the indicated positions: domain name trypsin at amino acid positions 1 to 62, 
domain name trypsin at amino acid positions 95 to 259. This indicates that the sequence of the 
invention has properties similar to those of other proteins known to contain this/these domain(s) 
and similar to the properties of these domains. 

15 Chromosomal information: 

The Peptidase disclosed in this invention maps to chromosome 16. This information was 
assigned using OMIM, the electronic northern bioinformatic tool implemented by CuraGen 
Corporation, public ESTs, public literature references and/or genomic clone homologies. This 
was executed to derive the chromosomal mapping of the SeqCalling assemblies. Genomic 
20 clones, literature references and/or EST sequences that were included in the invention. 

Tissue expression 

The Peptidase disclosed in this invention is expressed in at least the following tissues: 
Adrenal gland, bone marrow, brain - amygdala, brain - cerebellum, brain - hippocampus, brain - 
substantia nigra, brain - thalamus, brain -whole, fetal brain, fetal kidney, fetal liver, fetal lung, 
25 heart, kidney, lymphoma - Raji, mammary gland, pancreas, pituitary gland, placenta, prostate, 
salivary gland, skeletal muscle, small intestine, spinal cord, spleen, stomach, testis, thyroid, 
trachea, uterus. This information was derived by determining the tissue sources of the sequences 
that were included in the invention including but not limited to SeqCalling sources. Public EST 
sources, and/or RACE sources. 
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Cellular Localization and Sorting 

The SignalP, Psort and/or Hydropathy profile for the Peptidase-like protein are shown in 
Table 18. The results predict that this sequence has no signal peptide and is likely to be localized 
in the cytoplasm with a certainty of 0.4500 predicted by PSORT. 

S Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Peptidase-like protein includes the 
nucleic acid whose sequence is provided in Figure 18, or a fragment thereof. The invention also 
includes a mutant or variant nucleic acid any of whose bases may be changed from the 
corresponding base shown in Figure 18 while still encoding a protein that maintains its 

10 Peptidase-like activities and physiological functions, or a fragment of such a nucleic acid. The 
invention further includes nucleic acids whose sequences are complementary to those just 
described, including nucleic acid fragments that are complementary to any of the nucleic acids 
just described. The invention additionally includes nucleic acids or nucleic acid fragments, or 
complements thereto, whose structures include chemical modifications. Such modifications 

15 include, by way of non-limiting example, modified bases, and nucleic acids whose sugar 

phosphate backbones are modified or derivatized. These modifications are carried out at least in 
part to enhance the chemical stability of the modified nucleic acid, such that they may be used, 
for example, as antisense binding nucleic acids in therapeutic applications in a subject. In the 
mutant or variant nucleic acids, and their complements, up to about 1% of the residues may be so 

20 changed. 

The novel protein of the invention includes the Peptidase-like protein whose sequence is 
provided in Figure 18. The invention also includes a mutant or variant protein any of whose 
residues may be changed from the corresponding residue shown in Figure 18 while still encoding 
a protein that maintains its Peptidase-like activities and physiological functions, or a functional 
25 fragment thereof. In the mutant or variant protein, up to about 1% of the bases may be so 
changed. 

Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
(Fab)2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
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invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier partcle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 



The protein similarity information, expression pattern, and map location for the 
Peptidase-like protein and nucleic acid disclosed herein suggest that this Peptidase may have 
important structural and/or physiological functions characteristic of the Serine protease family. 
Therefore, the nucleic acids and proteins of the invention are useful in potential diagnostic and 

10 therapeutic applications and as a research tool. These include serving as a specific or selective 
nucleic acid or protein diagnostic and/or prognostic marker, wherein the presence or amount of 
the nucleic acid or the protein are to be assessed, as well as potential therapeutic applications 
such as the following: (i) a protein therapeutic, (ii) a small molecule drug target, (iii) an antibody ^ 
target (therapeutic, diagnostic, drug targeting/cytotoxic antibody), (iv) a nucleic acid useful in 

15 gene therapy (gene delivery/gene ablation), and (v) a composition promoting tissue regeneration 
in vitro and in vivo (vi) biological defense weapon. 

The nucleic acids and proteins of the invention are useful in potential diagnostic and 
therapeutic applications implicated in various diseases and disorders described below and/or 
other pathologies. For example, the compositions of the present invention will halve efficacy for 

20 treatment of patients suffering from: cell proliferative disorder; arteriosclerosis; psoriasis; 
myelofibrosis; cancer; autoimmune disorder; Crohn's disease; inflammatory disorder; AIDS; 
anaemia; allergy; asthma; atherosclerosis; Grave's disease; multiple sclerosis; scleroderma; 
infection; diabetes; metabolic disorder; Addison's disease; cystic fibrosis; glycogen storage 
disease; obesity; nutritional edema, hypoproteinemia and other diseases, disorders and conditions 

25 of the like. 

These materials are further useful in the generation of antibodies that bind 
immunospecifically to the novel substances of the invention for use in therapeutic or diagnostic 
methods. 



5 



Uses of the Compositions of the Invention 



62 



1 oil y H-S 6. ilini ... Sli B ir:' ii? O 'c' 



Table 14. BLASTN identity search for the nucleic acid of the invention. 

>patn:Z34002 Human PR0351 nucleotide sequence - Homo sapiens, 2365 bp. (SEQID 
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15 



20 



25 



30 



35 



40 



45 



50 
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NO:71) 



Length = 2365 

Plus Strand HSPS: 

Score = 4192 (629.0 bits). Expect = 1.9e-184, P = 1.9e-184 
Identities = 840/842 (99%), Positives = 840/842 (99%), Strand 



Plus / Plus 



Queiry : 


J. 


Sbjct : 


O *) £ 


Query : 


D J. 


Sbjct : 


996 


Query : 


121 


Sbjct : 


1056 


Query: 


181 


Sbjct : 


1116 


Query : 


O /I 1 


Sbjct : 


1 1 ifi. 
XJ. / o 


Query : 




o£7 J c u : 




Query : 




Sb j c t : 


1296 




421 


Sb j C t : 


1356 


Query : 


481 


Sbjct: 


1416 


Query: 


541 


Sbjct: 


1476 


Query: 


601 


Sbjct: 


1536 


Query : 


661 


Sbjct: 


1596 


Query: 


721 


Sbjct: 


1656 


Query: 


781 


Sbjct: 


1716 



AGCGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGG 

lllllllllllillllllllllllllllllllllllllllllllllllllllllllllll 

AGCGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGG 



60 



995 



120 



TGCAGGGCCCCTGTCAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGAC 

llllllllllllllllllllllllllllllllllllllllllllllllllllilllllll 

TGCAGGGCCCCTGTCAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGAC 1055 

ACTGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTG 180 

MIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIillllll 

ACTGGGTTCAGGCTGGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTG 1115 

TGCTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAG 240 

IMIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIMIIIIIIMIIIIIMIIIIil 

TGCTGCTGACCAACACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAG 1175 
CTTTCCTGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCT 300 

IIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CTTTCCTGGCCCAGAGCCCAGAGACCCCGGAGATG AGTGATGAGGACAGCTGTGTAGCCT 1235 
GTGGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGG 360 

IMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

GTGGATCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGG 12 95 

CCAGGCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGG 42 0 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CCAGGCTGATGCACC AGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGG 1355 

TGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGC 480 

llllllllllllllllllllllllllllllllllllilllllllllllllllllllllll 

TGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCCCAGAGGAATGGAGCGTAGGGC 1415 



TGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCC 

IIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIMIIMII 

TGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCATCCTGCATGGAGCCTACACCC 



540 



1475 



600 



ACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAG 

IIIIIIIIIIIIIIIIMIIIIIIIMIIIIIIIIMIIIIMIMIIIIIIIIIIIIII 

ACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCAGCCTGTGACACTGGGAG 1535 
CCAGCCTGCGGCCCCTCTGCCTGCCCTATGCTGACCACCACCTGCCTGATGGGGAGCGTG 660 

IIIIIIIMIIIIIIIIIIIIIMIIIII IIIIIIIIIIIIIIIIIIIMIIIIIIIII 

CCAGCCTGCGGCCCCTCTGCCTGCCCTATCCTGACCACCACCTGCCTGATGGGGAGCGTG 1595 



GCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCG 

IIMIIIIIIIIIIIIIIIIIIIIIMIIIIIMIIIIIIIIIIIIIIIIIIIIMIIII 

GCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGCCCG 



720 



1655 



780 



TGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCA 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

TGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCA 1715 



GCCCTATTCTGCCGGGGATGGTGTGTACCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGG 

lllllllllllllllllllllllllllllllllllllllllllllillllllllllllll 



840 



1775 



63 



Query: 
Sbjct: 



841 CC 842 
I 

1776 GC 1777 



Score = 1915 (287.3 bits), Expect = 1.4e-81, P = 1.4e-81 (SEQ XD NOsll4) 
Identities = 635/848 (74%), Positives = 635/848 (74%), Strand = Plus / Plus 
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15 



20 



25 



30 



35 



40 



45 



50 



55 



60 
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Query : 


353 


Sbjct: 


1508 


Query: 


408 


Sbjct: 


1564 


Query: 


465 


Sbjct: 


1623 


Query : 


523 


Sbjct: 


1679 


Query: 


583 


Sbjct: 


1734 


Query: 


639 


Sbjct: 


1791 


Query: 


697 


Sbjct: 


1849 


Query: 


754 


Sbjct: 


1907 


Query: 


813 


Sbjct: 


1961 


Query: 


869 


Sbjct: 


2021 


Query: 


929 


Sbjct: 


2080 


Query: 


989 


Sbjct: 


2140 


Query: 


1049 


Sbjct: 


2200 


Query: 


1109 


Sbjct: 


2260 


Query: 


1169 


Sbjct: 


2320 


Score 


= 267 



CTGGGAGGCCAGGCTGATGCAC-CAGGGACAGCTGGCCTGTGGCGGAGC — CCTGG — TG 407 

III I MM III II II I llll II Mill III I MM I 

CTGCTGGCCCAGCCTG-TG-ACACTGGGA- -GCCAGCCTGCGGCCCCTCTGCCTGCCCTA 1563 

TCA-GAGGAGGCGGTGC-TAACTGCTGCCCACTGCTTCATTGGGCGCCAGGCCC-CAGAG 464 

II III III II II I I III II I II Mill I II 

TCCTGACCACCACCTGCCTGA-TGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAG 1622 

GAATGGAGCGTAGGGCTGGGGACCAGACCGGAGGAGTGGGGCCTGAAGCAGCTCAT- -CC 522 

M I II I I II llllll I III III I II I II 

GAGCAG-GCATCAG-CTCCCT-CCAGACAGTGCCCGTGAC-CCTCCTGGGGCCTAGGGCC 1678 

TGCATGGAGCCTACACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCC 582 

llll I I II II II I I Mill I I M Mill I llll II 

TGCA-GCCGGCTGCATGCAGC-TCCTGGGGGTGATGGCA— GCCCTATT-CTGCCGGGGA 1733 

AGCCTGTG-ACACTGGGA--GCCAGCCTGCGGCCCCTCTGCCTGC-CCTATGCTGAC-CAC 638 

I llll llllll I II MM III I III I III Ml 

TGG-TGTGTAC-CAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGGCCTGT-CTGGGGCAC 1790 
CACC — TGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCA 696 

III III III II III I I Mill II Mill II III III 

CACTGGTGCATGA-GGTGAGGGGCACATGGTTCCTGGCCGGGCT-GCACAGCTTCGGAGA 1848 

T - CA-GCTCCCTCCA-GACAGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGC 753 

MM M I III III I I Ml Ml llll III 

TGCTTGCCAAGGCCCCGCCAG-GCCGGCGGTCTTCACCGCGCTCCCTGCCTAT-GAGGAC 1906 

TGCATGCAGCTCCTGGGGGTGATGGCAGCCCTA-TTCTGCCGGGGATGGTGTGTACCAGT 812 

II I MM Ml II I III III III llll III III M 

TGGGT -CAGCAGTTTGGACTG — G-CAGGTCTACTTC -GCCGAGGAACCAGAGCCCGAG- 1960 

GCTGTGGGTG -A-GCTGCCCAGCTGTGAG- -GCCAACCAACCAGCTGCTGACAGGGGACC 868 

MM I II I llllll II I IIIIIMIIIIIIIIIIIIIIIIIIIIII 

GCTGAGCCTGGAAGCTGCCTGGCCAACAT AAGCCAACCAACCAGCTGCTGACAGGGGACC 2020 
TGGCCATTCTCAGGAACAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTC 928 

lllllllllllllll IIIMMIIIIIIIIMIIIIIIIMIIIIIIIIIIIIIIIIII 

TGGCCATTCTCAGGA-CAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTC 2079 

CCCACCCTGTCATGTGTGATTCCAGGCACCAGGGCAGGCCCAGAAGCCCAGCAGCTGTGG 988 

IIIIIMIIIIIIMIMIII Mill IIIIMMIIIIIIIIIIIIIIMIIIIIIIIII 

CCCACCCTGTCATGTGTGATTCCAGGCACCAGGGCAGGCCCAGAAGCCCAGCAGCTGTGG 2139 

GAAGGAACCTGCCTGGGGCCACAGGTGCCCACTCCCCACCCTGCAGGACAGGGGTGTCTG 1048 

MIMIIIIIIIIIIIIIIIIM IIIMIIIMIIIIIMMIIIIIIMIIIIIIIIM 

GAAGGAACCTGCCTGGGGCCACAGGTGCCC ACTCCCCACCCTGCAGGACAGGGGTGTCTG 2199 
TGGACACTCCCACACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTA 1108 

IIMMIIIIMIIMIIIIIMIIIIIIIIIIMIIIMII MIIIMIMIIIIIIM 

TGGACACTCCCACACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTA 22 59 

CCCTTTCAGATACAATCACGCCAGCCACGTTGTTTTGAA7VATTTCTTTTTTTGGGGGGCA 1168 

I IIIIIIIIIIIMIIIIIilllllllllllllllllllllllllllllllllllllll 
CTCTTTCAGATACAATCACGCCAGCCACGTTGTTTTGAAAATTTCTTTTTTTGGGGGGCA 2319 

GCAGTTTTCCTTTTTTTAAACTTAAATAAATT 1200 

lllllllllllllllllllllllillllllll 

GCAGTTTTCCTTTTTTTAAACTTAAATAAATT 



= 267 <40.1 bits). Expect = 0.0078, P 



2351 

= 0.0078 ( 
64 



SEQ ID NO: 115) 
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Identities = 349/598 (58%), Positives = 349/598 (58%), Strand = Plus / Plus 



10 



15 



20 



25 



30 



35 



40 



45 



50 



Query; 


: 275 


Sbjct: 


424 


Query; 


: 334 


Sbjct: 


478 


Query; 


; 393 


Sbjct: 


536 


Query: 


452 


Sbjct: 


596 


Query : 


506 


Sbjct : 


652 


Query : 


563 


Sbjct: 


710 


Query : 


621 


Sbjct : 


763 


Query: 


678 


Sbjct : 


821 


Query: 


735 


Sbjct: 


877 


Query: 


791 


Sbjct: 


934 


Query: 


849 


Sbjct : 


989 


>patn: 


A37664 


NO: 72) 





GAGTGA-TGAGGACAGCTGTGTAGCCTGTGGATCCTTGAGGACAGCAGGTCCCCAGGCAG 333 

I III II lllllll III III II II III nil II II 

GCGTGCCTGTGGACAGC-GTG — GCCCC-GGCCCCCCCAAGCCT-CAGGAGGGCAA-CAC 477 



II nil I I iiiniiii lllllll III I II III I 



II 



nil iiiiiiii nil 



III II iiinniiinnni i in 



CCAGGCCCCAGAG--GAATGGAGCGT- 

III I nil nil n 



-GG-CTGGGGACCAGACCGGAGGAGTG-GGG 505 

II I III I II II III III 
[■GGTCCTGGGTTCT — CTGCAGC-GTGAGGG 651 



II 



I I 



III 



I I I 



563 GGCC-CTCCTGCTGCTGGCCCAG-CCTGTGACACTGGGAGCCAGCCTGCGGCCCCTCTGC 620 

II II I II III III nil I I III II III I II II I I I 

::CACTACAGCCAG-GGCTCAGACCTG-GCC-CTGCT-GC-AGC-T-CGCCCACCCCAC 762 



III I I II I llllll I 



I I I III 



I III I I 



-GCCCGCCCAGGAGCAGGCATCAGCTCCCTCCAGACAGTGC-CC-GTGACCCTCCTGGGG 

II I III II II iiiii II III I III II I iniii I I 



677 



734 



876 



II I llllll II II 



-T -GCAGCTCCTGGGGGTG - ATGG -CAGCCCTATTCT 790 

I II I I n II II nil M 



791 GCCGGGGATGG-TGTGTA-CCAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGCCAACCAA 848 

II I II III I II I M I II I III I II I nil III 

CCAGCGACACCTGTCCAACCCGGCCCG-GCCTGGGATGCTATG-TGGG-GGCCC-CCAG 



II I I II IIIII INI I 



988 



Homo sapiens, 1661 bp (SEQ id 



Length = 1661 
Plus Strand HSPs: 



55 



Score = 3831 (574.8 bits). Expect = 5.6e-168, P = 5.6e-168 
Identities = 767/768 (99%), Positives = 767/768 (99%), Strand 



Plus / Plus 



60 



65 



Query: 


75 


Sbjct: 


320 


Query: 


135 


Sbjct: 


380 


Query: 


195 


Sbjct: 


440 



CAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACACTGGGTTCAGGCT 134 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CAGGGAGATTCCGGGGGCCCTGTGCTGTGCCTCGAGCCTGACGGACACTGGGTTC AGGCT 379 

GGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTGCTGCTGACCAAC 194 

llllllllllllilllllllllllllllllllllllllllllllllllllllllllllll 

GGCATCATCAGCTTTGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTGCTGCTGACCAAC 439 

ACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAGCTTTCCTGGCCCAG 254 

illlllllllllllllllllllllllllllilllllllilllllllllllllllllllll 

ACAGCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTTCAGGGGGCAGCTTTCCTGGCCCAG 499 



65 



t. Ill Cii 7' H- IS iE"j. Shi. 01 F!l e-i"'* IS 1 





Query: 


255 




Sbjct: 


500 


3 


Query: 


315 




Sbjct: 


560 


10 


Query: 


375 




Sbjct: 


620 




Query : 


435 




Sbjct: 


680 




Query: 


495 


20 


Sbjct: 


740 




Query: 


555 


25 


Sbjct: 


800 




Query: 


615 




Sbjct: 


860 


30 


Query: 


675 




Sbjct: 


920 


35 


Query : 


735 




Sbjct: 


980 




Query: 


795 




Sbjct: 


1040 




Score 


= 1931 




Identities = 


45 


Query: 


353 




Sbjct : 


818 




Query : 


410 




Sbjct: 


874 




Query : 


466 


55 


Sbjct: 


934 




Query : 


524 


60 


Sbjct: 


990 




Query: 


584 




Sbjct: 


1045 


65 


Query : 


640 




Sbjct: 


1102 



lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



ACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCCAGGCTGATGCAC 

lllllllllllllllllllllllllillllllllllllllllllllllllllllllllll 
ACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCCAGGCTGATGCAC 



374 



619 



III 



IIIIIIIIIIIIIIIIIIIIIIIIIIIIINIIIIIIIIIIIIINIIIII 



lllllllllillllllllllllllllllllllllllllllllllllllllllllllll 



lllllll 



llllllllllllllllllllllllllllllllllllllllllllllll 



llllllllli Mill llllllllllllllllllllllllllll II llllllll lllllll 



Mil 



llllillllllllllllllllllllllllilllllllllllll 



lllill 



lllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



735 CCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGCCCTATTCTGCCG 794 

llllllllllllllllilllllllllillllillllillllllllllllllillllllll 
CCTAGGGCCTGCAGCCGGCTGCATGCAGCTCCTGGGGGTGATGGCAGCCCTATTCTGCCG 1039 



lllllllllllllllllllllllllllllilillllilllllllll I 



Expect = 3.7e-82, P = 3.7e-82 (seq id NOilie) 



III I Mil III II II I III 



I Mill III 



/ Plus 



-CCTGGTGTC 409 



AGAGGAGGCGGTGCTAACTGCTGCCCA-C-TG-CTTCATTGGGCGCCAGGCCC-CAGAGG 

I II II III II I I II II II I I I Mill I III 

TGCTGACCACCACCTGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGG 



465 



933 



I lllllll 



Mill I 



III III I II I 



-CCT 523 

III 
3CCT 989 

583 



GCATGGAGCCTACACCCACCCTGAGGGGGGCTACGACATGGCCCTCCTGCTGCTGGCCCA 

III I I II II II I I Mill I I II Mill I MM II 

GCA-GCCGGCTGCATGCAGC-TCCTGGGGGTGATGGCA — GCCCTATT-CTGCCGGGGAT 1044 
GCCTGTG-ACACTGGGA-GCCAGCCTGCGGCCCCTCTGCCTGC-CCTATGCTGAC-CACC 63 9 

I MM II I I I I I II MM III I III I III MM 

GG-TGTGTAC-CAGTGCTGTGGGTGAGCTGCCCAGCTGTGAGGGCCTGT-CTGGGGCACC 1101 

ACC — TGCCTGATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGCAGGCAT 697 

II III III II III I I Mill II Mill II III I I II 
ACTGGTGCATGA-GGTGAGGGGCACATGGTTCCTGGCCGGGCT-GCACAGCTTCGGAGAT 1159 



66 



i, tll CP ^Hl"i!"i. ifb ifti O V"? C 



Query: 698 -CA-GCTCCCTCCA-GACAGTGCCCGTGACCCTCCTGGGGCCTAGGGCCTGCAGCCGGCT 754 

III III III III I I I II I 11 INI I I II 

Sb j C t : 1160 GCTTGCCAAGGCCCCGCCAG-GCCGGCGGTCTTCACCGCGCTCCCTGCCTAT- GAGGACT 1217 
Query: 755 GCATGCAGCTCCTGGGGGTGATGGCAGCCCTA-TTCTGCCGGGGATGGTGTGTACCAGTG 813 

I I MM I II 11 I III III III MM III I I I II I 

Sbjct: 1218 GGGT- CAGCAGTTTGGACTG — G-CAGGTCTACTTC -GCCGAGGAACCAGAGCCCGAG -G 1271 



10 



Query: 



814 CTGTGGGTG-A-GCTGCCCAGCTGTGAG- -GCCAACCAACCAGCTGCTGACAGGGGACCT 869 



III I M I llllll II I IIIIIIIIIIIIIIMIIIIIMMMIII 

Sb j C t : 1272 CTGAGCCTGGAAGCTGCCTGGCCAACATAAGCCAACCAACCAGCTGCTGACAGGGGACCT 1331 



Query: 870 GGCCATTCTCAGGAACAAGAGAATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTCC 929 

MIIIIMMIIM IIIIIIIIIIMIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIII 

15 Sbjct: 1332 GGCCATTCTCAGGA-CAAGA6AATGCAGGCAGGCAAATGGCATTACTGCCCCTGTCCTCC 1390 



20 



Qu ery : 930 CCACCCTGTC ATGTGTGATTCCAGGCACC AGGGCAGGCCCAG AAGCCCAGCAGCTGTGGG 989 

IIIIIIMMIIIIIIIIIIIIIMIIMIIIMIIIMIIIIIIMIIIIIIIIIIIII 

Sb j C t : 1391 CC ACCCTGTCATGTGTGATTCCAGGCACC AGGGC AGGCCCAGAAGCCCAGCAGCTGTGGG 1450 
Query: 990 AAGGAACCTGCCTGGGGCCACAGGTGCCCACTCCCCACCCTGCAGGACAGGGGTGTCTGT 1049 

IIIIIIMIIIIIIIIIIIIIIIMIIIIIIIIIIMIIIIIIMMIIIMIIIIIIII 

Sbjct: 1451 AAGGAACCTGCCTGGGGCC ACAGGTGCCCACTCCCCACCCTGCAGGACAGGGGTGTCTGT 1510 

25 Query: 1050 GGACACTCCCACACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTAC 1109 

ililllllllllllllllllllllllllllMIIIIIIIIIIIIIIIIIIIIIIIIIIII 
Sbjct: 1511 GGACACTCCCACACCCAACTCTGCTACCAAGCAGGCGTCTCAGCTTTCCTCCTCCTTTAC 1570 



Query: 1110 CCTTTCA6ATACAATCACGCCAGCCACGTTGTTTTGAAAATTTCTTTTTTTGGGGGGCAG 1169 
Sbjct ^ 1571 iUUiiiiUiilTCiiii^^^ 1630 
Query: 1170 CAGTTTTCCTTTTTTTAAACTTAAATAT^TT 1200 

II MM MM III Mill III llllll MM 

35 Sbjct: 1631 CAGTTTTCCTTTTTTTAAACTTAAATAAATT 1661 

Score = 559 (83.9 bits). Expect = 8.2e-17, P = 8.2e-17 (seq zd NOsll7) 
Identities = 609/1017 (59%), Positives =609/1017 (59%), Strand = Plus / Plus 

40 Query: 1 AGCGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGGG 60 

. Illlllllllllllllllllilllllllllllllllllllilllllllllllllllllil 

Sbjct: 93 AGCGACACCTGTCCAACCCGGCCCGGCCTGGGATGCTATGTGGGGGCCCCCAGCCTGGG6 152 



45 



50 



55 



60 



65 



Query: 51 TGCAGGGCCCCTGTCAGGGA-GATTCCGGGG-GCCCTGT-GCTGTGCCTCGAGCCTGACG 117 

MMIMMMIMMM III 11^ I I II I III II 

Sb j C t : 153 TGCAGGGCCCCTGTCAGGTCTGATAGGGAGAAGAGAAGGAGCAGAAGGG-GAGGG-GCCT 210 
Query: 118 GACACTGGGTTCAGGCTGGCA-TCATCAG — CTTTGCATCA-AGCTGTGCCCAGGAGGAC 173 

II Mill I M I I I Ml III II I I Mill III Ml 

Sbjct: 211 AACCCTGGGCTGGGGGTTGGACTCA-CAGGACTGGGGGAAAGAGCTGCAATCAG-AGGGT 268 
Query: 174 GCTCCTGTGCT-GCTGACCA-ACACAGCTGCTCACAGTTCCTGGCTGCA-GGCTC G- 226 

Mill I MM I I II III M I I I I III Mill I 

Sbjct: 2 69 G-TC-TGCCATAGCTGGGCTCAGGCATCTG-TCCTTGG-CTTTGTTGCCTGGCTCCAGGG 324 
Query: 227 AG-TTCAGGGGGCAGCTTTCCTG-GCCCAGAGCCC-AGAGACCCCGGAGATGAGTGATGA 283 

II III llllll II I III III Mill I Ml I II I I II I II 

Sbjct: 325 AGATTCCGGGGGCC-CTGTGCTGTGCCTCGAGCCTGACGGACACTGG-GTTCAG-GCTG- 380 
Query: 284 GGACAGCTGTGTAGCCTGTGGATCCT--TGAGGACAGCAGGTC-C-CCAG-GCAGGAGCA 338 

I II I I, III MM 1 1 III lllllll I III 111 II 

Sb j C t : 381 -G-CATCA-TC-AGCTT-TGCATCAAGCTGTGCCCAGGAGGACGCTCCTGTGCTGCTG-A 434 
Query: 339 CCCTCCCCATGGCCCTGGGAGG-CCAGGCTG-ATGCACCAGGGACAGCTGGCCTGTGGCG 396 

II I I I I M II II Mill I II I II III III I I 

Sb j C t : 435 CCAACAC - A- GCTGCTCACAGTTCCTGGCTGCAGGCTCGAGTT-CAGGGGGCAGCTTTCC 491 



Query: 397 GAGCCCTGGTGTCAGAGGAGGCGGTGCTAACTGCTGCCCACTGCTTCATTGGGCGCCAGG 456 

67 



,1, o 7' i!'S fe,.iF':i. ,„ rs si-r* c-'.' n-ii 



10 



15 



20 



25 



30 



35 



40 



Sbjct: 


492 


Query: 


457 


Sbjct : 


551 


Query : 


514 


Sbjct: 


610 


Query: 


571 


Sbjct: 


669 


Query: 


630 


Sbjct : 


723 


Query : 


^ 686 


Sbjct: 


782 


Query: 


742 


Sbjct : 


837 


Query: 


796 


Sbjct: 


894 


Query : 


854 


Sbjct: 


951 


Query : 


913 


Sbjct: 


1008 


Query: 


969 


Sbjct: 


1065 



IIIII IIIII MM M II II . M IN II I I 

TGGCCCAGAGCCCAGAGACCCCGGAGATGAGTGATGAGGACAGCTGTGTAGCCTGTG-GA 550 

CCCCAGAGGAATGGAG — CGTAGGGCTGGGG-ACCAGACCGGAGGAGTGGGGCCTGAAGC 513 

II IIIII I II I III II I III II II III II 

TCCTTGAGGACAGCAGGTCCCCAGGCAGGAGCACCCTCCCCATGGCCCTGGGAGGCCAG- 609 

AGCTCATCCTGCATGGAGC - CTACACCCACCCTGAGGGGGGCTA- C -GACATGGCCCTCC 570 

IN II I II III III MM IIIII III I I 

-GCTGATGCACCAGGGACAGCTGGCCTGTGGCGGAGCCCTGGTGTCAGAGGAGGCGGTGC 668 



TG-CTGCTGGCCCAGCCTGTGACACTGGGAGCCAGCCTGCGGCCCCTCTGCCTGCCCTAT 

I IIIMI MM III II Mil Mill III II M II 



I Ml MM I I II IIIII I II I II II II II I 



629 



685 



781 



AGGAGCAGGCATCAGCTCC -CTCCAGACAGTGCCCGTGACCOTCCTGGG GCCTAGGG 741 

I M I III II I MM MM I II MM MM I 

ACC-— CACCC-TGAGGGGGGCTAC-GACATGGCCC-TCCTGCTGCTGGCCCAGCCTGTGA 83 6 
C-CTGC-AGCCGGC-TGCATGCAGCTCCTGGGGGTGATG-GCAG-CC-CTATTCTGCCGG 795 

I III MM II III II Mi II I I II MM I IIIII I 

CACTGGGAGCCAGCCTGCG-GCCCCTC-TGCCTGCCCTATGCTGACCACCAC-CTGCCTG 893 
GGATGGTGTGTACCAGTGCTGTGGGT -GAGCT-GCCCAGCTGTGAGGCCAACCAACCAGC 853 

II Ml llllllll III IIIMI I MM MM I 

ATGGGGAGCGTGGCTGGGTTCTGGGACGGGCCCGCCCAGGAGC - AGGC — ATC AGCTCCC 950 



I illM I 



II II Ml 11 MM IIIII I Ml III M 



I Ml 



II I I MM MM 11 III 



I IllM II I 



CAGAAGCCCAGCAGCTGTGGGAAGGAACCTGCCTGGGGC — CACAGGTGC 1016 

11 11 llllllll I II MM Mlllil Mi Mill 

GTGA-GCTGCCCAGCTGTGAG- -GG- -CCTGTCTGGGGCACCACTGGTGC 1109 



45 



Table 15. BLASTP identity search for the protein of the invention. 



>patp:Y41704 Human PR0351 protein sequence 

NO:73) 



Homo sapiens, 571 aa. (seq id 



50 



Length = 571 
Plus Strand HSPs : 



55 



60 



65 



Score = 1514 (533.0 bits), Expect = 1.6e-154, P = 1.6e-154 
Identities = 278/279 (99%), Positives - 278/279 (99%), Frame 



+3 



Query: 


3 


Sbjct: 


215 


Query: 


183 


Sbjct: 


275 


Query : 


363 



RHLSNPARPGMLCGGPQPGVQGPCQGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPV 182 

llllllllllllllllilllllllllllllllllllllllillllllllllllllillll 
RHLSNPARPGMLCGGPQPGVQGPCQGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPV 274 

LLTNTAAHSSWLQARVQGAAFLAQS PETPEMSDEDSCVACGSLRTAGPQAGAPS PWPWEA 362 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
LLTNTAAHSSWLQARVQGAAFLAQSPETPEMSDEDSCVACGSLRTAGPQAGAPSPWPWEA 334 



llllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



68 



1 ?, J r.ii «+*"^ f-!, ii'-u .... il II e i:"!! ir!' 



10 



15 



20 



25 



30 



35 



40 



Sbjct : 


335 


Query: 


543 


Sbjct: 


395 


Query : 


723 


Sbjct: 


455 


Score 


= 225 


Identities 


Query: 


339 


Sbjct: 


63 


Query: 


495 


Sbjct: 


123 


Query : 


669 


Sbjct : 


179 


Query: 


834 


Sbjct: 


234 


Score 


= 125 


Identities : 


Query : 


15 


Sbjct: 


474 


Query : 


195 


Sbjct: 


532 



RLMHQGQLACGGALVSEEAVLTAAHCFIGRQAPEEWSVGLGTRPEEWGLKQLI LHGAYTH 394 

PEGGYDMALLLLAQPVTLGASLRPLCLPYADHHLPDGERGWVLGRARPGAGI SSLQTVPV 722 

lllllllllllllllllllllllllllll llllllllllllllllllllllllllllll 
PEGGYDMALLLLAQPVTLGASLRPLCLPYPDHHLPDGERGWVLGRARPGAGISSLQTVPV 454 



lllllllllllllllllllllllllllllllllllll 



(79-2 bits), Expect = 4.6e-15, P = 4.6e-15 (seq id NOsllS) 

= 71/203 (34%), Positives = 95/203 (46%), Frame = +3 

PSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGRQAPE — EWSVGLGT RP 494 

I IIIH ^ II I llllllll I I III 11+ I 

PGEWPWQASVRRQGAHICSGSLVADTWVLTAAHCFEKAAATELNSWSWLGSLQREGLSP 122 

— EEWGLKQLI LHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYADHHLPDGERGWV 668 

II k II III I hill II I I Mill III I 

GAEEVGVAALQLPRAYNHYSQGSDLALLQLAHPTTH TPLCLPQPAHRFPFGASCWA 178 



■K + + 1+ h +11 



II I + I ++ +1 



+ I lll+l 



I I 



0.00067, P = 0.00067 



(SEQ ZD NO s 119) 

e = +3 



+ 1 MM 



l+l II I 



I II 



1 1- 



I + I 



II- 



>patp:Y90291 Human peptidase, HPEP-8 protein sequence - Homo sapiens, 267 aa. 

(SEQ ID NO:74) 

45 

Length = 267 



50 



65 



Plus strand HSPs: 

Score = 1028 (361.9 bits). Expect = 5.0e-103, P = 5.0e-103 
Identities = 189/189 (100%), Positives = 189/189 (100%), Frame = +3 



MSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGR 452 

55 II 1 1 1 II 1 1 II 1 1 1 1 II I II 1 1 1 II M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I II I II II II 1 1 

MSDEDSCVACGSLRTAGPQAGAPSPWPWEARLMHQGQLACGGALVSEEAVLTAAHCFIGR 60 

QAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYA 632 

llllllllllllllllllllllllllllllllllllilllllllllllllllllllllll 
60 Sbjct: 61 QAPEEWSVGLGTRPEEWGLKQLILHGAYTHPEGGYDMALLLLAQPVTLGASLRPLCLPYA 120 

DHHLPDGERGWVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPGMVCTS 812 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
DHHLPDGER(3WVLGRARPGAGISSLQTVPVTLLGPRACSRLHAAPGGDGSPILPGMVCTS 180 



Query: 


273 


Sbjct: 


1 


Query: 


453 


Sbjct: 


61 


Query : 


633 


Sbjct: 


121 


Query: 


813 


Sbjct: 


181 



AVGELPSCE 839 

llllllll! 
AVGELPSCE 189 

69 



if.li O rh ft, ,„ ii-if ill 



Score 


= 125 


(44.0 bits), Expect = 0.00016, P = 0.00016 (SEQ ID NO: 120) 




Identities = 


= 32/95 (33%), Positives = 47/95 (49%), Frame = +3 




Query: 


15 


NPARPGMLCGGPQPGVQGPCQGDSGGPVLCLEPDGHWVQAGIISFASSCAQEDAPVLLTN 


194 






+1 lll+l 1 l+l II 1 1 1 1 1 Ik II +1 1 + 1 




Sbjct: 


170 


S PI LPGMVCTS AV-GELPSCEGLSGAP- LVHEVRGTWFLAGLHSFGDACQGPARPAVFTA 


227 


Query: 


195 


TAAHSSWLQARVQGAAFLAQS PETPEMSDEDSCVA 299 








1+ 1+ + -H + 1+ II II ll + l 




Sbjct: 


228 


LPAYEDWVSS-LDWQVYFAEEPE-PE-AEPGSCLA 259 





Table 16. BLASTN identity searcli (versus the human SeqCalling database for the 
15 Peptidase-like protein of the invention. 

>s3aq: 132854740 Category D: 12 frag (12 non-5 * sig-CG) , 636 bp. (SEQ ID NO: 75) 
Length = 63 6 



20 



35 



50 



55 



Minus Strand HSPs: 
Score = 1423 (213.5 bits). Expect = 7.0e-59, P = 7.0e-59 

Identities = 313/343 (91%), Positives = 313/343 (91%), Strand = Minus / Plus 



AGCCGGCTGCAG-GCCCTAGGCCCCAGGAGGGTCACGGGCACTGTCTGGAGGGAGCTGAT 696 

25 " ' III llllll I MM Ml II II II III I II I I 

AGCTGGCTGCCCCGGCCT -GCAGGTTGGATGGACAGCAGCCCTGGCCCT-GTGCCCACCT 352 

GCCTGCTCCTGGGCGGGCCCGTCCCAGAACCCAGCCACGCTCCCCATCAGGCAGGTGGTG 636 

IIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIMIIIII 

30 Sbjct: 353 ACCTGCTCCTGGGCGGGCCCGTCCCAGAACCCAGCCACGCTCCCCATCAGGCAGGTGGTG 412 

GTCAGCATAGGGCAGGCAGAGGGGCCGCAGGCTGGCTCCCAGTGTCACAGGCTGGGCCAG 57 6 

Mill IIIMIIIIIIIIIIIIIMMIIIIIIIIIIIIIIIIIIMIIIIIIIIIIII 

GTCAGGATAGGGCAGGC AGAGGGGCCGC AGGCTGGCTCCCAGTGTCAC AGGCTGGGCCAG 472 



40 Query: 515 CTGCTTCAGGCCCCACTCCTCCGGTCTGGTCCCCAGCCCTACGCTCCATTCCTCTGGGGC 456 

lllllllllllllillllllllllllllllllllllllllllllllllllllllllllll 

CTGCTTCAGGCCCCACTCCTCCGGTCTGGTCCCC AGCCCTACGCTCC ATTCCTCTGGGGC 592 



Query: 


754 


Sbjct: 


295 


Query: 


695 


Sbjct: 


353 


Query : 


635 


Sbjct: 


413 


Query : 


575 


Sbjct: 


473 


Query: 


515 


Sbjct: 


533 


Query : 


455 


Sbjct: 


593 


Score 


= 757 


Identities = 


Query: 


869 


Sbjct: 


105 


Query: 


809 


Sbjct: 


163 


Query: 


749 


Sbjct: 


223 



CAGCAGGAGGGCCATGTCGTAGCCCCCCTCAGGGTGGGTGTAGGCTCCATGCAGGATGAG 516 

llllllllilllllllllllllllllllllllllllllllllllllllllllllllllll 

CAGCAGGAGGGCCATGTCGTAGCCCCCf :TCAGGGTGGGTGTAGGCTCCATGCAGGATGAG 532 



CTGGCGCCCAATGAAGCAGTGGGCAGCAGTTAGCACCGCCTCCT 412 

45 ' 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ri 

CTGGCGCCCAATGAAGCAGTGGGCAGCAGTTAGCACCGCCTCCT 636 



(113.6 bits). Expect = 8.5e-29, P = 8.5e-29 (SEQ ZD NO: 121) 
= 165/179 (92%), Positives = 165/179 (92%), Strand = Minus / Plus 

AGGTCCCCTGTCAGCAGCTGGTTGGTTGGCCTCACAGCTGGGCAGCTCACCCACAGCACT 810 

MM Ml I MM II I IIMIIIIilMIIIMIIIMIMIIIIII 

AGGTAAGGTGTGGGGGCCTGG — GGCTCACCTCACAGCTGGGCAGCTCACCCACAGCACT 162 



llllll II MM I IIMIIM llllll II IMIIIII Mill MM II Mill II Mill 

GGTACACACCATCCCCGGCAGAATAGGGCTGCCATCACCCCCAGGAGCTGCATGCAGCCG 

GCTGCAGGCCCTAGGCCCCAGGAGGGTCACGGGCACTGTCTGGAGGGAGCTGATGCCTG ( 

60 ~ ~ II M II II M II II II II II II 1 1 IJ II M II II 1 1 II II II II II II I II II II II M 



>s3aq:134913963 Category E: 1 frag (1 non-CG EST), 415 bp. (SEQ ID NO:76) 

70 



1. r.n o -r « I s lb. i6, -iTJi s iP s 



Length 
Plus Strand HSPs : 



415 



Score ~ 297 (44.6 bits). Expect = 8.0e-07, P = 8.0e-07 
Identities = 61/63 (96%), Positives = 61/63 (96%), Strand 



Plus / Plus 



10 



15 



Query: 1138 TTGTTTTGAAAATTTCTTTTTTTGGGGGGCAGCAGTTTTCCTTTTTTTAAACTTAAATAA 1197 

III I IIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIMIIIII 

Sbjct: 10 TTGGTGTGAAAATTTCTTTTTTTGGGGGGCAGCAGTTTTCCTTTTTTTAAACTTAAATAA 69 

Query: 1198 ATT 1200 
III 

Sbjct: 70 ATT 72 



Table 17. ClustalW alignment of the protein of the invention. 



0050817-06 - 

Y41704 MLLS S LV S L AOSV Y LAWI LF F VLYDFC I V C I T TYAI N VS LMWLS F RKV QEPQGKAKR HGN" 

Y90291 

CGJ50817-06 

Y41704 TVPOEWPWQASVRRQGAHICSGSLVADTWVLTAAHCFEKAAATELNSWSVVLGSLQR EGL 

Y90291 

CG508 17-06 - 

Y41704 SFGAE EVGVAALQLPRAYNHYSQGSDLA L LQ LAHPTTHTP L C LP QPAHRFPFGASCWATG 

Y90291 

CG50817-06 

Y41704 WDQDT SDAPGTLRNLRLRH SRPTCNCI Y NQ LHQRHLSNP ARPG 

Y90291 



IvI LOG 


JGPQPGV 


QGP CQG 


Iv.T L C G 


JGPQPGV 


OGP COG 



CGi0817-06 1 


DSC 


tG P V LC L E 


PD-: 


UHVVVQA 


01 I 3FAS3C 


A Q ED 


AP VL 


LTNTAAH^ 


3SVVLQA 


RVQGAAFLA ( 


:j S F' 1 


Y41704 


DSC 


J G P V L C L E 


PD( 


IIHWVQA 


GI I 3FA3SC 


A Q E. D 


AP VL 


lthtaah;= 


3SVVLQA 


RVQGAAF LA ( 





Y90291 

CG50817-06 

Y41704 

Y90291 

C05QZ17JQ6 

Y41704 

Y90291 

CG50817-06 

Y41704 

Y90291 



E T P E M S D E D S C V A C G S L R T A G P Q AG A P S P VVP VvEA R LIvIHQ G Q L A C G G A L V S E E A V L T A A H C 
E T P E M S D E D S C V AC G S LRTAGPQAGAPS P WP WEARLMHQGQ L AC G GA L VSEEA V LT A A HC 
V.1 3 D E D 3 C V AC G 3 L R T A G POAGAP 3 P WP WE ARLP4HQ GQ L AC G GA L VS E EA V LT A A H C 



F IGRQ AP EEWSVGLG^TRPEEWGLKQLI L H GA. Y THP EGGYDMA LL L LAQ PVTLGA3LR P LC 
F I G R Q A P E E WS V GL G T R P E E WGLKQL I L H G A Y TH P E GG YDM A L L L L A Q PVT LGA3 L R P L C 
F I G R O A P E E WS VGLGTRPE E WGLKQL I L H G A Y TH P E GG YDM A L L L L A Q P VT LGA 3 L R P L C 



L P Y A D H H L P D GERGWV L GRA R P GAG I S S L Q T V P V T L L GP R A C 3 R L H A A PGGDG 3 P I L P GM 
L P YQD H H L P D GERGWV L G RA R P GAG ISSLQTVPVTLLGPRACSRLHAA PGGDG3 P I L P GM 
L P V A D H H L P D GERGWV L G RA R P GAG I S S L 0 T V P V T L L GP R A C 3 R L H A A PGGDG3 P I L P OU 
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Information for the ClustalW proteins: 



Accno 



Common Name 



Length 
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CG508 17-06 (SEQ id no:47) novel Peptidase-like protein 

Y41704 (SEQ ID NO:i22) Human PR0351 protein sequence. 571 

Y90291 (SEQ ID NO:i23) Human peptidase, HPEP-8 protein sequence. 267 

In the alignment shown above, black outlined amino acid residues indicate regions of 
conserved sequence (i.e., regions that may be required to preserve structural or functional 
properties); greyed amino acid residues can be mutated to a residue with comparable steric 
and/or chemical properties without altering protein structure or function (e.g. L to V, I, or M); 
5 non-highlighted amino acid residues can potentially be mutated to a much broader extent without 
altering structure or function. 

Table 18. Psort, Signal P and hydropathy results for CG50817-06 

cytoplasm Certainty=0 . 4500 (Affirmative) < suco 

microbody (peroxisome) Certainty=0 . 3000 (Affirmative ) < suco 

10 lysosome (lumen) Certainty=0 . 2334 (Affirmative ) < suco 

. " mitochondrial matrix space Certainty=0 . 1000 (Affirmative) < suco 

Is the sequence a signal peptide? 

15 # Measure Position Value Cutoff Conclusion 

max, C 45 0.253 0.37 NO 

max. Y 17 0.064 0.34 NO 

max. S 68 0.536 0.88 NO 

mean S 1-16 0.130 0.48 NO 

20 SECP14 

A SECP14 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:48) and encoded polypeptide sequence (SEQ ID NO:49) of clone 
CG508 17-06 directed toward novel serine protease-like proteins and nucleic acids encoding 
them. Figure 19 illustrates the nucleic acid sequence and amino acid sequences respectively. 

25 This clone includes a nucleotide sequence (SEQ ID NO:48) of 1214 bp. The nucleotide 

sequence includes an open reading frame (ORF) beginning with an ATG initiation codon at 
nucleotides 31-33 and ending at nucleotides 1 186-1 188. Putative untranslated regions, if any, 
are found upstream from the initiation codon and downstream from the termination codon. The 
encoded protein having 385 amino acid residues is presented using the one-letter code in Figure 

30 19. The protein encoded by clone CG5 1099-03 is predicted by the PSORT program to the 

outside of the membrane with a certainty of 0.5804, and appears to be a signal protein (see Table 
22 below). 



72 



The serine protease tryptase (ECNr. 3.4. 21.59), which is almost exclusively 
expressed in mast cells, is released by mast cell degranulation in an enzymatically active form 
together with other mediators, e.g. histamine, into the extracellular space and the circulation. The 
capability of the enzyme to directly stimulate several cell types as well as to cleave polypeptide 
5 hormones and to activate pro-enzymes suggests a role for tryptase in inflammatory and tissue- 
remodeling processes. Therefore, in the skin, a role of tryptase is suggested not only in 
mastocytosis and immediate type hypersensitivity reactions, but also in other inflammatory 
diseases, degenerative or neoplastic conditions as well as in wound healing, where an 
accumulation and/or activation of mast cells is found. Extracellular tryptase may be superior to 
10 histamine as a parameter for the onset and course of inmiediate type reactions and as an indicator 
for the activation of mast cells in other conditions. Its absence during histamine-liberating 
reactions may suggest basophil activation. In addition, tryptase has been shown to be a sensitive 
and specific marker for the localization of mast cells in tissues (Ludolf-Hauser et al., 1999, 
Hautarzt 50:556-61). 

15 Tryptases are stored in abundance in the secretory granules of mouse (McNeil et al, 1992, 

Proc. Natl. Acad. Sci. U. S. A. 89, 11174-.11178; Johnson, D. A., and Barton, G., 1992, Protein 
Sci. 1, 370-377), and human (Vanderslice et al., 1990, Proc. Natl. Acad. Sci. U. S. A. 87, 381 1- 
3815) mast cells (MCs). In humans, the four homologous tryptases (designated tryptases 1, 11/ , 
III, and ) that have been cloned reside at a complex on chromosome 16 (Pallaoro et al., 1999, J. 

20 Biol. Chem. 274, 3355-3362). Although only two tryptases (designated mouse MC protease 
(mMCP) 6 and mMCP-7) have been identified so far in the mouse, their genes reside -1.2 
centimorgans away from each other on the syntenic region of mouse chromosome 17 (Gurish et 
al., 1994, Mammal. Genome 5, 656-657). Despite the chromosomal clustering of their genes, 
these mouse tryptases are differentially regulated in vivo (Reynolds et al., 1990, Proc. Natl. 

25 Acad. Sci. U. S. A. 87, 3230-3234) and in vitro (Reynolds et al., 1991, J. Biol, Chem. 266, 3847- 
3853; McNeil et al, 1992, Proc. Natl. Acad. Sci. U. S. A. 89, 1 1 174-1 1 178) at the levels of gene 
transcription (Morri et al., 1996, Blood 88, 2488-2494) and mRNA stability. 

All known mouse and human tryptases in this family are initially translated as zymogens. 
They possess an ~20-residue hydrophobic signal peptide which is presumed to be removed in the 
30 endoplasmic reticulum immediately after the translated zymogen is translocated into the lumen. 
They also possess an -lO-residue propeptide preceding the mature portion of the enzyme which 
consists of -245 amino acids. Although tryptases undergo variable N-linked glycosylation during 
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their biosynthesis (Ghidyal et al., 1994, J. Immunol. 153, 2624-2630), the current members of 
the family appear to be targeted to the secretory granule by a serglycin proteoglycan-dependent 
mechanism (Ghidyal et al., 1996, J. Exp. Med. 184, 1061-1073) rather than by a Man-P04- 
dependent mechanism as are classical lysosomal enzymes. 

5 Recentiy, Wong et al. (1999, J Biol Chem 274, 30784-30793) described a novel mouse 

gene, and its human ortholog, which encode an unusual transmembrane tryptase (TMT). 
Comparative structural studies indicated that the putative transmembrane tryptase (TMT) 
possesses a unique substrate-binding cleft. As assessed by RNA blot analyses, mTMT is 
expressed in mice in both strain- and tissue-dependent manners. Thus, different transcriptional 

10 and/or post-transcriptional mechanisms are used to control the expression of mTMT in vivo. 
Analysis of the corresponding tryptase locus in the human genome resulted in the isolation and 
characterization of the hTMT gene. The hTMT transcript is expressed in numerous tissues and is 
also translated. Analysis of the tryptase family of genes in mice and humans now indicates that a 
primordial serine protease gene duplicated early and often during the evolution of mammals to 

15 generate a panel of homologous tryptases in each species that differ in their tissue expression, 
substrate specificities, and physical properties. 

Similarities 

In a search of sequence databases, it was found, for example, that the nucleic acid 
20 sequence of this invention has 1213 of 1213 bases (100%) identical to a gbiGENBANK- 
ID:AX079882|acc:AX079882.1 mRNA from Homo sapiens (Sequence 13 from Patent 
WO0105971) (See Table 19). The full amino acid sequence of the protein of the invention was 
found to have 385 of 385 amino acid residues (100%) identical to, and 385 of 385 amino acid 
residues (100%) similar to, the 385 amino acid residue ptnr:SPTREMBL-ACC:Q9UI38 protein 
25 from Homo sapiens (Human) (TESTES-SPECIFIC PROTEIN TSP50)(See Table 20). 

A multiple sequence alignment is given in Table 21, with the protein of the invention 
being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
related protein sequences. 

The presence of identifiable domains in the protein disclosed herein was determined by 

30 searches versus domain databases such as Pfam, PROSITE, ProDom, Blocks or Prints and then 

identified by the Interpro domain accession number. Significant domains are summarized below: 
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Model Domain seq-f seq-t hinm-f hrom-t score E-value 



5 trypsin 1/2 118 297 6 199 104.4 2.6e-32 

trypsin 2/2 313 353 215 259 35.9 1.6e-10 

The catalytic activity of the serine proteases from the trypsin family is provided by a 
charge relay system involving an aspartic acid residue hydrogen-bonded to a histidine, which 
itself is hydrogen-bonded to a serine. The sequences in the vicinity of the active site serine and 
10 histidine residues are well conserved in this family of proteases (Sprang et al., 1987, Science 
237:905-909). A partial list of proteases known to belong to the trypsin family is shown below. 

- Acrosin. 

- Blood coagulation factors VII, IX, X, XI and XII, thrombin, plasminogen, 
and protein C. 

15 -CathepsinG. 

- Chymotrypsins. 

- Complement components Clr, Cls, C2, and complement factors B, D and I. 

- Complement-activating component of RA-reactive factor. 

- Cytotoxic cell proteases (granzymes A to H). 
20 - Duodenase L 

- Elastases 1, 2, 3 A, 3B (protease E), leukocyte (meduUasin). 

- Enterokinase (EC 3 .4.2 1 .9) (enteropeptidase). 

- Hepatocyte growth factor activator. 

- Hepsin. 

25 - Glandular (tissue) kallikreins (including EGF-binding protein types A, B, 

and C, NGF-gamma chain, gamma-renin, prostate specific antigen (PSA) and 
tonin). 

- Plasma kallikrein. 

- Mast cell proteases (MCP) I (chymase) to 8. 

30 - Myelobiastin (proteinase 3) (Wegener's autoantigen). 

- Plasminogen activators (urokinase-type, and tissue-type). 

- Trypsins I, II, III, and IV. 

- Tryptases. 

- Snake venom proteases such as ancrod, batroxobin, cerastobin, flavoxobin, 
35 and protein C activator. 

-Collagenase from common cattle grub and collagenolytic protease from 
Atlantic sand fiddler crab. 

- Apolipoprotein(a). 

- Blood fluke cercarial protease. 

40 - Drosophila trypsin like proteases: alpha, easter, snake-locus. 

- Drosophila protease stubble (gene sb). 

- Major mite fecal allergen Der p III. 

All the above proteins belong to family SI in the classification of peptidases. 

This indicates that the sequence of the invention has properties similar to those of other 
45 proteins known to contain this/these domain(s) and similar to the properties of these domains. 
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Chromosomal information: 

The Serine Protease-like gene disclosed in this invention maps to chromosome 3. This 
assignment was made using mapping information associated with genomic clones, public genes 
and ESTs sharing sequence identity with the disclosed sequence and CuraGen Corporation's 
5 Electronic Northern bioinformatic tool. 

Tissue expression 

The Serine Protease-like gene disclosed in this invention is expressed in at least the . 
following tissues: adipose, adrenal gland, thyroid, brain, heart, skeletal muscle, bone marrow, 
colon, bladder, liver, lung, mammary gland, placenta, testis. Expression information was derived 
10 from the tissue sources of the sequences that were included in the derivation of the sequence of 
CuraGen Acc. No. CG5 1099-03 .The sequence is predicted to be expressed in the following, 
tissues because of the expression pattern of (GENBANK-ID: gbiGENBANK- 
ID:AX079882|acc:AX079882.1) a closely related Sequence 13 from Patent WO0105971 
homolog in species Homo sapiens: testis. 

1 5 Cellular Localization and Sorting 

The PSORT, SignalP and hydropathy profile for the Serine Protease-like protein are 
shown in Table 22. The results predict that this sequence has a signal peptide and is likely to be 
localized extracellularly with a certainty of 0.5804. The signal peptide is predicted by SignalP to 
be cleaved at amino acid 39 and 40: CWG-AG. 

20 Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Serine Protease-like protein includes 

the nucleic acid whose sequence is provided in Figure 19, or a fragment thereof. The invention 

also includes a mutant or variant nucleic acid any of whose bases may be changed from the 

corresponding base shown in Figure 19 while still encoding a protein that maintains its Serine 

25 Protease-like activities and physiological functions, or a fragment of such a nucleic acid. The 

invention further includes nucleic acids whose sequences are complementary to the sequence of 

CuraGen Acc. No. CG5 1099-03, including nucleic acid fragments that are complementary to any 

of the nucleic acids just described. The invention additionally includes nucleic acids or nucleic 

acid fragments, or complements thereto, whose stmctures include chemical modifications. Such 
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modifications include, by way of non-limiting example, modified bases, and nucleic acids whose 
sugar phosphate backbones are modified or derivatized. These modifications are carried 'out at 
least in part to enhance the chemical stability of the modified nucleic acid, such that they may be 
used, for example, as antisense binding nucleic acids in therapeutic applications in a subject. In 
5 the mutant or variant nucleic acids, and their complements, up to about 0% of the bases may be 
so changed. 

The novel protein of the invention includes the Serine Protease-like protein whose 
sequence is provided in Figure 19. The invention also includes a mutant or variant protein any of 
whose residues may be changed from the corresponding residue shown in Figure 19 while still 
10 encoding a protein that maintains its Serine Protease-like activities and physiological functions, 
or a functional fragment thereof. In the mutant or variant protein, up to about 0% of the amino . 
acid residues may be so changed. 

Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
15 (Fab)2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier particle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

20 Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, cellular localization, and map 
location for the protein and nucleic acid disclosed herein suggest that this Serine Protease-like 
protein may have important structural and/or physiological functions characteristic of the 
Trypsin family. Therefore, the nucleic acids and proteins of the invention are useful in potential 
25 diagnostic and therapeutic applications and as a research tool. These include serving as a specific 
or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the presence or 
amount of the nucleic acid or the protein are to be assessed. These also include potential 
therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small molecule 
drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic antibody). 
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(iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), (v) an agent promoting 
tissue regeneration in vitro and in vivo^ and (vi) a biological defense weapon. 

The nucleic acids and proteins of the invention have applications in the diagnosis and/or 
treatment of various diseases and disorders. For example, the compositions of the present 
5 invention will have efficacy for the treatment of patients suffering from: adrenoleukodystrophy , 
congenital adrenal hyperplasia, hyperthyroidism, hypothyroidism. Von Hippel-Lindau (VHL) ^ 
syndrome, Alzheimer's disease, stroke, tuberous sclerosis, hypercalceimia, Parkinson's disease, 
Huntington's disease, cerebral palsy, epilepsy, Lesch-Nyhan syndrome, multiple sclerosis, 
ataxia-telangiectasia, leukodystrophies, behavioral disorders, addiction, anxiety, pain, 

10 neurodegeneration, cardiomyopathy, atherosclerosis, hypertension, congenital heart defects, 

aortic stenosis, atrial septal defect (ASD), atrioventricular (A-V) canal defect, ductus arteriosus, 
pulmonary stenosis, subaortic stenosis, ventricular septal defect (VSD), valve diseases, 
scleroderma, obesity, transplantation, muscular dystrophy, myasthenia gravis, hemophilia, 
hypercoagulation, idiopathic thrombocytopenic purpura, autoimmune disease, allergies, 

15 immunodeficiencies, graft versus host disease, cirrhosis, systemic lupus erythematosus, asthma, 
emphysema, ARDS, fertility, cancer, as well as other diseases, disorders and conditions. 

These materials are further useful in the generation of antibodies that bind 
immunospecifically to the novel substances of the invention for use in diagnostic and/or 
therapeutic methods. 

20 Table 19. BLASTN search using CuraGen Acc. No. CG51099.03. 

>gb:GENBANK-lD:AX079882 |acc:AX079882.1 Sequence 13 from Patent WO0105971 - Homo 
sapiens, 1359 bp. (seq id nO:77) 
Length = 13 59 

25 Plus Strand HSPs : 

Score = 6065 (910.0 bits). Expect = 4.8e-268, P = 4.8e-268 

Identities = 1213/1213 (100%), Positives = 1213/1213 (100%), Strand = Plus / Plus 
30 Query: 1 CGGAGAGACGCAGTCGGCTGCCACCCCGGGATGGGTCGCTGGTGCCAGACCGTCGCGCGC 60 

IllllllllllllillllllllllllllUIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

Sbjct: 15 CGGAGAGACGCAGTCGGCTGCCACCCCGGGATGGGTCGCTGGTGCCAGACCGTCGCGCGC 74 

Query: 61 GGGCAGCGCCCCCGGACGTCTGCCCCCTCCCGCGCCGGTGCCCTGCTGCTGCTGCTTCTG 120 

35 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Sbjct: 75 GGGCAGCGCCCCCGGACGTCTGCCCCCTCCCGCGCCGGTGCCCTGCTGCTGCTGCTTCTG 134 

Query: 121 TTGCTGAGGTCTGCAGGTTGCTGGGGCGCAGGGGAAGCCCCGGGGGCGCTGTCCACTGCT 180 

llllllllllllllllllllllllllllllllllllllllllilllllllllllllllll 

40 Sbjct: 135 TTGCTGAGGTCTGCAGGTTGCTGGGGCGCAGGGGAAGCCCCGGGGGCGCTGTCCACTGCT 194 

Query: 181 GATCCCGCCGACCAGAGCGTCCAGTGTGTCCCCAAGGCCACCTGTCCTTCCAGCCGGCCT 240 
llllllllllllllllllilllllllllllllllllllllllllllllllllllllllll 
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15 



20 



25 



30 



35 



40 



45 



50 



55 



60 
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Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query : 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct : 
Query: 
■ Sbjct: 
Query : 
Sbjct: 
Query : 
Sbjct: 
Query : 
Sbjct : 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query : 
Sbjct: 
Query : 
Sbjct : 
Query: 
Sbjct: 
Query : 



195 GATCCCGCCGACCAGAGCGTCCAGTGTGTCCCCAAGGCCACCTGTCCTTCCAGCCGGCCT 254 



241 



300 



CGCCTTCTCTGGCAGACCCCGACCACCCAGACACTGCCCTCGACCACCATGGAGACCCAA 

lllllllllllllllllllllllillllllllllllllllllllllllllllllllllll 
255 CGCCTTCTCTGGCAGACCCCGACCACCCAGACACTGCCCTCGACCACCATGGAGACCCAA 314 



301 



360 



TTCCCAGTTTCTGAAGGCAAAGTCGACCCATACCGCTCCTGTGGCTTTTCCTACGAGCAG 

lllllllllllllllllllllllllllllllillllllllllllllllllllllllllll 

315 TTCCCAGTTTCTGAAGGCAAAGTCGACCCATACCGCTCCTGTGGCTTTTCCTACGAGCAG 374 



361 



420 



GACCCCACCCTCAGGGACCCAGAAGCCGTGGCTCGGCGGTGGCCCTGGATGGTCAGCGTG 

IIIIIIMMIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

375 GACCCCACCCTCAGGGACCCAGAAGCCGTGGCTCGGCGGTGGCCCTGGATGGTCAGCGTG 434 



421 



480 



CGGGCCAATGGCACACACATCTGTGCCGGCACCATCATTGCCTCCCAGTGGGTGCTGACT 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
435 CGGGCCAATGGCACACAC ATCTGTGCCGGCACC ATCATTGCCTCCCAGTGGGTGCTGACT 494 



481 



GTGGCCCACTGCCTGATCTGGCGTGATGTTATCTACTCAGTGAGGGTGGGGAGTCCGTGG 
I I r I I I I I I I I I I I I I ! I 
I I I I I I I I I I I I 1 I I t 1 I 



540 



495 GTGGCCCACTGCCTGATCTGGCGTGATGTTATCTACTCAGTGAGGGTGGGGAGTCCGTGG 554 

541 ATTGACCAGATGACGCAGACCGCCTCCGATGTCCCGGTGCTCCAGGTCATCATGCATAGC 600 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
555 ATTGACCAGATGACGCAGACCGCCTCCGATGTCCCGGTGCTCCAGGTCATCATGCATAGC 614 

601 AGGTACCGGGCCCAGCGGTTCTGGTCCTGGGTGGGCCAGGCCAACGACATCGGCCTCCTC 660 

lllllllllllllillllllllllllllllllllllllllllllllllllllllllllll 

615 AGGTACCGGGCCCAGCGGTTCTGGTCCTGGGTGGGCCAGGCCAACGACATCGGCCTCCTC 



661 



674 



720 



AAGCTCAAGCAGGAACTCAAGTACAGCAATTACGTGCGGCCCATCTGCCTGCCTGGCACG 

Mllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

67 5 AAGCTCAAGCAGGAACTCAAGTACAGCAATTACGTGCGGCCCATCTGCCTGCCTGGCACG 734 
721 GACTATGTGTTGAAGGACCATTCCCGCTGCACTGTGACGGGCTGGGGACTTTCCAAGGCT 7 80 

IIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

73 5 GACTATGTGTTGAAGGACCATTCCCGCTGCACTGTGACGGGCTGGGGACTTTCCAAGGCT 794 

7 81 GACGGCATGTGGCCTCAGTTCCGGACCATTCAGGAGAAGGAAGTCATCATCCTGAACAAC 840 

IIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

795 GACGGCATGTGGCCTCAGTTCCGGACCATTCAGGAGAAGGAAGTCATCATCCTGAACAAC 854 

841 AAAGAGTGTGACT^TTTCTACCACAACTTCACCAAAATCCCCACTCTGGTTCAGATCATC 900 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIMIIIIIIIIIIIIIIII 

855 AAAGAGTGTGACAATTTCTACCACAACTTCACCAAAATCCCCACTCTGGTTCAGATCATC 914 



901 



960 



AAGTCCCAGATGATGTGTGCGGAGGACACCCACAGGGAGAAGTTCTGCTATGAGCTAACT 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIII 

915 AAGTCCC AGATGATGTGTGCGGAGGACACCCACAGGGAGAAGTTCTGCTATGAGCTAACT 974 

961 GGAGAGCCCTTGGTCTGCTCCATGGAGGGCACGTGGTACCTGGTGGGATTGGTGAGCTGG 1020 

IIIMIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIII 

975 GGAGAGCCCTTGGTCTGCTCCATGGAGGGCACGTGGTACCTGGTGGGATTGGTGAGCTGG 1034 
1021 GGTGCAGGCTGCCAGAAGAGCGAGGCCCCACCCATCTACCTACAGGTCTCCTCCTACCAA 1080 

IIIIIMIIIIIIIIIIillllllllllllllllllllllllllllllllllllllllll 

1035 GGTGCAGGCTGCCAGAAGAGCGAGGCCCCACCCATCTACCTACAGGTCTCCTCCTACC AA 1094 
1081 CACTGGATCTGGGACTGCCTCAACGGGCAGGCCCTGGCCCTGCCAGCCCCATCCAGGACC 1140 

lllllllllllllllllllllllllllllllllllllillllMIIIIIIIIMMIIII 

1095 CACTGGATCTGGGACTGCCTCAACGGGCAGGCCCTGGCCCTGCCAGCCCCATCCAGGACC 1154 
1141 CTGCTCCTGGCACTCCCACTGCCCCTCAGCCTCCTTGCTGCCCTCTGACTCTGTGTGCCC 1200 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

1155 CTGCTCCTGGCACTCCCACTGCCCCTCAGCCTCCTTGCTGCCCTCTGACTCTGTGTGCCC 1214 
1201 TCCCTCACTTGTG 1213 



llllllllllll 
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15 



30 



Sbjct: 1215 TCCCTCACTTGTG 1227 

Table 20. BLASTP search using the protein of CuraGen Acc . No. CG51099-03. 
>ptnr:SPTREMBL-ACC:Q9Ul38 TESTES-SPECIFIC PROTEIN TSP50 - Homo sapiens (Human), 

385 aa. <seq id NO: 78) 

Length = 385 

Score = 2090 (735.7 bits). Expect = 4.5e-216, P = 4.5e-216 
Identities = 385/385 (100%), Positives = 385/385 (100%) 

MGRWCQTVARGQRPRTSAPSRAGALLLLLLLLRSAGCWGAGEAPGALSTADPADQSVQCV 60 

llllllllllllllillllllllMIIIIIIIIIIIIIIIIIIIIIMIIIillllMII 



IMIIIIIIIIIIMIIMIIIIIIIIIIIIIIIIlllillllllMlMIIIIIMMI 



Query : 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query : 


181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 


Query: 


361 


Sbjct: 


361 



ARRWPWMVSVRANGTHICAGTI I ASQWVLTVAHCLIWRDVI YSVRVGSPWIDQMTQTASD 180 

20 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

ARRWPWMVSVRANGTHICAGTI I ASQWVLTVAHCLIWRDVI YSVRVGSPWIDQMTQTASD 180 

VPVLQVIMHSRYRAQRFWSWVGQANDIGLLKLKQELKYSNYVRPICLPGTDYVLKDHSRC 240 

IIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

25 Sbjct: 181 VPVLQVIMHSRYRAQRFWSWVGQANDIGLLKLKQELKYSNYVRPICLPGTDYVLKDHSRC 240 

TVTGWGLSKADGMWPQFRTIQEKEVIILNNKECDNFYHNFTKIPTLVQIIKSQMMCAEDT 300 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

TVTGWGLSKADGMWPQFRTIQEKEVIILNNKECDNFYHNFTKIPTLVQIIKSQMMCAEDT 300 



HREKFCYELTGEPLVCSMEGTWYLVGLVSWGAGCQKSEAPPIYLQVSSYQHWIWDCLNGQ 360 

IIIIIMIIIIIIIMMIIIIIIIIIIIIIIIIIIIIIIIIIIMIMMIIIIIIIII 

HREKFCYELTGEPLVCSMEGTWYLVGLVSWGAGCQKSEAPPIYLQVSSYQHWIWDCLNGQ 3 60 



35 Query: 3 61 ALALPAPSRTLLLALPLPLSLLAAL 385 

Illllllllllllllllllllllll 
361 ALALPAPSRTLLLALPLPLSLLAAL 385 



40 Table 21- ClustalW alignment of CG51099-03 protein with related proteins. 
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Information for the ClustalW proteins: 



Accno 

CG5 1099-03 (SEQ ID NO:49) 
TEST^HUMAN (SEQ ID NO:124) 



Common Name 
novel Serine Protease-like protein 
TESTISIN PRECURSOR (EC 3.4.21.-) 



(EOSINOPHIL SERINE PROTEASE 1) (ESP- DE 1). 
PSS8_HUMAN (SEQ ID NO:125) PROSTASIN PRECURSOR (EC 3.4.21.-). 

Q9UI38 (SEQ ID NO:78) TESTES-SPECIFIC PROTEIN TSP50. 



Length 

314 . 

343 
385 



In the alignment shown above, black outlined amino acid residues indicate residues 
5 identically conserved between sequences (i.e., residues that may be required to preserve 

structural or functional properties); amino acid residues with a gray background are similar to 
one another between sequences, possessing comparable physical and/or chemical properties 
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without altering protein structure or function (e.g. the group L, V, I, and M may be considered 
similar); and amino acid residues with a white background are neither conserved nor similar 
between sequences. 



Table 22. PSORT, SignalP and hydropathy results for CuraGen Acc. No. CG51099- 



03. 



outside Certainty=0. 5804 (Affirmative) < suco 

lysosome (liiinen) Certainty=0 . 5144 (Affirmative) < suco 

microbody (peroxisome) Certainty=0 . 1203 (Affirmative) < suco 

10 endoplasmic reticulum (membrane) Certainty=0 . 1000 (Affirmative) < suco 
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Is the sequence a signal peptide? 

# Measure Position Value Cutoff Conclusion 
max. C 40 0.888 0.37 YES 

max. Y 40 0.848 0.34 YES 
max.S 30 0.975 0.88 YES 
mean S 1-39 0.708 0.48 YES 

# Most likely cleavage site between pos. 39 and 40: CWG-AG 
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Hydropathy Plot plot for CG51099-03 with a window of 19 
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SECP 15 

A SECP 15 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:50) and encoded polypeptide sequence (SEQ ID NO:51) of clone 

CG57051-04 directed toward novel Angiopoietin-like proteins and nucleic acids 
5 encoding them. Figure 20 illustrates the nucleic acid sequence and amino acid sequences 
respectively. This clone includes a nucleotide sequence (SEQ ID NO:50) of 937 bp. The 
nucleotide sequence includes an open reading frame (ORF) beginning with an ATG initiation 
codon at nucleotides 155-157 and ending with a TAG stop codon at nucleotides 881-883. 
Putative untranslated regions, if any, are found upstream from the initiation codon and 
10 downstream from the termination codon. The encoded protein having 242 amino acid residues is 
presented using the one-letter code in Figure 20. The protein encoded by clone CG57051-04 is 
predicted by the PSORT program to be located at the endoplasmic reticulum with a certainty of 
0.8200, and appears to be a signal protein (see Table 27 below). 

PPARG ANGIOPOIETIN-RELATED PROTEIN - PGAR: 

15 Background 

The peroxisome proliferator-activated receptors (PPARs) are members of the nuclear 
hormone receptor subfamily of transcription factors. PPARs form heterodimers with retinoid X 
receptors (RXRs) and these heterodimers regulate transcription of various genes. There are 3 
known subtypes of PPARs, PPAR-alpha (170998), PPAR-delta (600409), and PPAR-gamma. 
20 PPAR-gamma is believed to be involved in adipocyte differentiation. Tontonoz et al. (1994) 

found 2 isoforms of PPAR-gamma in mouse, ganuna-l and gamma-2, resulting from the use of 
different initiator methionines. 

Elbrecht et al. (1996) cloned cDNAs of PPAR-gamma-1 and PPAR-ganMna-2 from 
human fat cell cDNA by PGR using primers based on the mouse sequence and on a previously 
25 published human cDNA sequence (Greene et al., 1995). They found that the human PPAR- ' 
gamma- 1 and PPAR-gamma-2 genes have identical sequences except that PPAR-gamma-2 
contains an additional 84 nucleotides at its 5-prime end. The sequences obtained by Elbrecht et 
al. (1996) differed at 3 sites from the previously published human PPAR-ganmia-l sequence of 
Greene et al. (1995). By Northern blot analysis, Elbrecht et al. (1996) found that human PPAR- 
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gamma is expressed at high levels in adipocytes and at a much lower level in bone marrow, 
spleen, testis, brain, skeletal muscle, and liver. 

The thiazolidinediones are synthetic compounds that can normalize elevated plasma 
glucose levels in obese, diabetic rodents and may be efficacious therapeutic agents for the 
5 treatment of noninsulin-dependent diabetes mellitus. Lehmann et al. (1995) identified the 
thiazolidinediones as high affinity ligands for mouse PPAR-garraia receptors. Elbrecht et al. 
(1996) confirmed that human PPAR-gamma-1 and PPAR-gamma-2 have similar activity and 
determined that 3 different thiazolidinedione compounds are agonists of PPAR-gamma-1 and 
PPAR-gamma-2. Elbrecht et al. (1996) speculated that the antidiabetic activity of the 
10 thiazolidinediones in humans is mediated through the activation of PPAR-gamma-1 and PPAR- 
gamma-2. 

The nuclear receptor PPARG/RXRA heterodimer regulates glucose and lipid homeostasis 
and is the target for the antidiabetic drugs G1262570 and the thiazolidinediones, Gampe et al. 
(2000) reported the crystal structures of the PPARG and RXRA ligand-binding domains 
15 complexed with the RXRA ligand 9-cis-retinoic acid, the PPARG agonist GI262570, and 

coactivator peptides. The structures provided a molecular understanding of the ability of RXRs 
to heterodimerize with many nuclear receptors and of the permissive activation of the 
PPARG/RXRA heterodimer by 9-cis-retinoic acid. 

Mueller et al. (1998) showed that PPAR-gamma is expressed at significant levels in 
20 human primary and metastatic breast adenocarcinomas. Ligand activation of this receptor in 
cultured breast cancer cells caused extensive lipid accumulation, changes in breast epithelial 
gene expression associated with a more differentiated, less malignant state, and a reduction in 
growth rate and clonogenic capacity of the cells. Inhibition of MAP kinase, a powerful negative 
regulator of PPAR-gamma, improves the thiazolidinedione ligand sensitivity of nonresponsive 
25 cells. These data suggested that the PPAR-gamma transcriptional pathway can induce terminal 
differentiation of malignant breast epithelial cells. 

Tontonoz et al. (1994) identified a novel adipocyte-specific transcription factor, which 
they termed ARF6, and showed that it is a heterodimeric complex of RXRA and PPARG. (This 
ARF6 is not to be confused with ADP-ribosylation factor 6 (600464), with is also symbolized 
30 ARF6.) Tontonoz et al. (1995) demonstrated that PPAR-gamma-2 regulates adipocyte expression 
of the phosphoenolpyruvate carboxykinase gene (PCKl, 261680; PCK2, 261650). 
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The formation of foam cells from macrophages in the arterial wall is characterized by 
dramatic changes in lipid metabolism, including increased expression of scavenger receptors and 
the uptake of oxidized low density lipoprotein (oxLDL). Tontonoz et al. (1998) demonstrated 
that the nuclear receptor PPAR-gamma is induced in human monocytes following exposure to 
5 oxLDL and is expressed at high levels in the foam cells of atherosclerotic lesions. Ligand 

activation of the PPAR-gamma:RXR-alpha heterodimer in myelomonocytic cell lines induced 
changes characteristic of monocytic differentiation and promoted uptake of oxLDL through 
transcriptional induction of the scavenger receptor CD36. These results revealed a novel 
signaling pathway controlling differentiation and lipid metabolism in monocytic cells. Tontonoz 
10 et al. (1998) suggested that endogenous PPAR-gamma ligands may be important regulators of 
gene expression during atherogenesis. 

Nagy et al. (1998) demonstrated that oxLDL activates PPAR-gamma-dependent 
transcription through a signaling pathway involving scavenger receptor-mediated particle uptake. 
Moreover, they identified 2 of the major oxidized linoleic acid metabolite components of 
15 oxLDL, 9-HODE and 13-HODE, as endogenous activators and ligands of PPAR-gamma. The . 
authors found that the biologic effects of oxLDL are coordinated by 2 sets of receptors, one on 
the cell surface, which binds and internalizes the particle, and one in the nucleus, which is 
transcriptionally activated by its component lipids. Nagy et al. (1998) suggested that PPAR- 
gamma may be a key regulator of foam cell gene expression. 

20 Chawla et al. (2001) provided evidence that in addition to lipid uptake, PPARG regulates 

a pathway of cholesterol efflux. PPARG induces ABCAl (600046) expression and cholesterol 
removal from macrophages through a transcriptional cascade mediated by the nuclear receptor 
LXRA (NR1H3; 602423). Ligand activation of PPARG leads to primary induction of LXRA and 
to coupled induction of ABCAl. Transplantation of PPAR null bone marrow into Ldlr -/- mice 

25 resulted in a significant increase in atherosclerosis, consistent with the hypothesis that regulation 
of LXRA and ABCAl expression is protective in vivo. Chawla et al. (2001) proposed that 
PPARG coordinates a complex physiologic response to oxLDL that involves particle uptake, 
processing, and cholesterol removal through ABCAl. 

Fajas et al. (1997) used competitive RT-PCR to distinguish relative PPARGl and 
30 PPARG2 mRNA levels in tissues. They determined that PPARG2 is much less abundant than 
PPARGl. The highest levels of PPARG are found in adipose tissue and large intestine, with 
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intermediate levels in kidney, liver, and small intestine, and barely detectable levels in muscle. 
Western blot analysis showed that PPARG is expressed as a 60-kD protein. EMSA analysis 
indicated that PPARG2 binds to and transactivates through a peroxisome proliferator response 
element. The PPARG gene contains 9 exons and spans more than 100 kb. Through alternative 
5 transcription start sites and altemate splicing, the mRNAs differ at their 5-prime ends, with 
PPARGl being encoded by 8 and PPARG2 by 7 exons. PPARGl uses exons Al and A2, 
whereas PPARG2 uses exon B; both use exons 1 through 6. 

Martin et al. (1998) reported that there are 3 PPARG isoforms which differ at their 5- . 
prime ends, each under the control of its own promoter. PPARGl and PPARG3, however, give 

10 rise to the same protein, encoded by expns 1 through 6, because neither the Al nor the A2 exon 
are translated. By RNase protection analysis, Ricote et al. (1998) showed that in phorbol ester- 
stimulated macrophage cell lines, a probe to PPARGl protected a 218-nucleotide fragment of 
PPARGl, but only a 174-nucleotide fragment of PPARG3. A PPARG2 probe protected a 
common 104-nucleotide fragment of both PPARGl and PPARG3. PPARG2 itself was not 

15 expressed in the stimulated macrophages. PPARGl and PPARG2 promoters are primarily used 
in adipose tissue. The authors speculated that other inducing factors, such as cytokines MCSF 
(120420) or GMCSF (138960), or oxidized LDL (see OLRl, 602601), might differentially 
regulate expression of the 3 isoforms. 

Lowell (1999) reviewed the role of PPARG in adipogenesis. 

. 20 Kersten et al. (2000) reviewed the roles of PPARs in health and disease. 

Tong et al. (2000) showed that murine GATA2 (137295) and GATA3 (131320) are 
specifically expressed in white adipocyte precursors and that their downregulation sets the stage 
for terminal differentiation. Constitutive GATA2 and GATA3 expression suppressed adipocyte 
differentiation and trapped cells at the preadipocyte stage. This effect was mediated, at least in 
25 part, through the direct suppression of PPARG. 

Mueller et al. (2000) showed that PPAR-gamma is expressed in human prostate 
adenocarcinomas and cell lines derived from these tumors. Activation of this receptor with 
specific ligands exerts an inhibitory effect on the growth of prostate cancer (176807) cell lines. 
They showed that prostate cancer and cell lines do not have intragenic mutations in the PPARG 
30 gene, although 40% of the informative tumors have hemizygous deletions of this gene. They 
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conducted a phase II clinical study in patients with advanced prostate cancer using troglitazone 
(Rezulin), a PPAR-gamma ligand used for the treatment of type II diabetes. Oral treatment was 
administered to 41 men with histologically confirmed prostate cancer and no symptomatic 
metastatic disease. An unexpectedly high incidence of prolonged stabilization of prostate- 
5 specific antigen (KLK3; 176820) was seen in patients treated with troglitazone. In addition, 1 
patient had a dramatic decrease in serum prostate-specific antigen to nearly undetectable levels. 
The findings suggested that PPAR-gamma may serve as a biologic modifier in human prostate 
cancer and that its therapeutic potential should be further studied. 

By somatic cell hybridization and linkage analysis, Greene et al. (1995) mapped the 
10 human PPARG gene to 3p25. Beamer et al. (1997) mapped the gene to 3p25 by fluorescence in 
situ hybridization. 

Meirhaeghe et al. (1998) detected a polymorphism corresponding to a silent C-to-T 
substitution in exon 6 of the PPARG gene (601487.0009). 

Since PPARG is a transcription factor that has a key role in adipocyte differentiation, 
15 Ristow et al. (1998) investigated whether mutations of the gene encoding this factor predispose 
people to obesity. They studied 358 unrelated German subjects, including 121 obese subjects, 
looking for mutations in the PPARG2 gene at or near a site of serine phosphorylation at position 
114 that negatively regulates transcriptional activity of the protein. Four of the 121 obese 
subjects had a missense mutation in the PPARG2 gene that resulted in conversion of proline to 
20 glutamine at position 1 15 (601487.0001), as compared with none of the 237 subjects of normal 
weight. All the subjects with the mutant allele were markedly obese. Overexpression of the 
mutant gene in murine fibroblasts led to the production of a protein in which the phosphorylation 
of serine at position 1 14 was defective, as well as accelerated differentiation of the cells into 
adipocytes and greater cellular accumulation of triglyceride than with the wildtype PPAR- 
25 gamma-2. These effects were similar to those of an in vitro mutation created directly at the 
serl 14 phosphorylation site. 

PPARGl and PPARG2 have ligand-dependent and -independent activation domains. 
PPARG2 has an additional 28 amino acids at the amino terminus that render its ligand- 
independent activation domain 5- to 10-fold more effective than that of PPARGl. Insulin 
30 stimulates the ligand-independent activation of PPARGl and PPARG2; however, obesity and 
nutritional factors influence only the expression of PPARG2 in human adipocytes. Deeb et al. 
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(1998) reported that a relatively common prol2-to-ala substitution in PPARG2 (601487.0002) is 
associated with lower body mass index and improved insulin sensitivity among middle-aged and 
elderly Finns. A significant odds ratio (4.35, P = 0.028) for the association of the pro/pro 
genotype with type 2 diabetes was observed among Japanese Americans. The PPARG2 ala allele 
5 showed decreased binding affinity to the cognate promoter element and reduced ability to 
transactivate responsive promoters. These findings suggested that the PPARG2 prol2-to-ala 
polymorphism may contribute to the observed variability in BMI and insulin sensitivity in the 
general population. 

Valve et al. (1999) investigated the frequencies of the prol2-to-ala polymorphism in exon 
10 B and the silent CAC478-to-CAT polymorphism in exon 6 of the PPARG gene and their effects 
on body weight, body composition, and energy expenditure in obese Finnish patients. The 
frequencies of the ala 12 allele in exon B and the CAT478 allele in exon 6 were not significantly 
different between the obese and population-based control subjects (0.14 vs 0.13 and 0.19 vs 0.21, 
respectively). The polymorphisms were associated with increased BMI, and the 5 women with 
15 both alal2ala and CAT478CAT genotypes were significantly more obese compared with the 

women having both prol2pro and CAC478CAC genotypes, and they had increased fat mass. The 
authors concluded that the prol2-to-ala and CAC478-to-CAT polymorphisms in the PPARG 
gene are associated with severe overweight and increased fat mass among obese women. 

Sarraf et al. (1999) identified 4 somatic mutations (1 nonsense, 1 frameshift, and 2 
20 missense) in the PPARG gene among 55 sporadic colon cancers (1 14500). Each mutation greatly 
impaired the function of the PPARG protein. The 472delA mutation (601487.0003) resulted in 
the deletion of the entire ligand binding domain. Q286P (601487.0004) and K3 19X 
(601487.0005) retained a total or partial ligand binding domain but lost the ability to activate 
transcription through a failure to bind to ligands. R288H (601487.0006) showed a normal 
25 response to synthetic ligands but greatly decreased transcription and binding when exposed to 
natural ligands. These data indicated that colon cancer in humans is associated with loss-of- 
function mutations in the PPARG gene. 

Barroso et al. (1999) reported 2 different heterozygous mutations in the ligand-binding 
domain of PPARG in 3 subjects with severe insulin resistance (604367). In the PPAR-gamma 
30 crystal structure, the mutations destabilized helix 12, which mediates transactivation. Consistent 
with this, both receptor mutants were markedly transcriptionally impaired and, moreover, were 
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able to inhibit the action of coexpressed wildtype PPAR-gamma in a dominant-negative manner. 
In addition to insulin resistance, all 3 subjects developed type 2 diabetes mellitus and 
hypertension at an unusually early age. Barroso et al. (1999) concluded that their findings 
represented the first germline loss-of-function mutations in PPAR-gamma and provided 
5 compelling genetic evidence that this receptor is important in the control of insulin sensitivity, 
glucose homeostasis, and blood pressure in man. 

Kroll et al. (2000) reported that t(2;3)(ql3;p25), a translocation identified in a subset of 
human thyroid follicular carcinomas, results in fusion of the DNA-binding domains of the 
thyroid transcription factor PAX8 (167415) to domains A to F of PPARGl. PAX8/PPARG1 
10 mRNA and protein were detected in 5 of 8 thyroid follicular carcinomas but not in 20 follicular 
adenomas, 10 papillary carcinomas, or 10 multinodular hyperplasias. PAX8/PPARG1 inhibited 
thiazolidinedione-induced transactivation by PPARGl in a dominant-negative manner. The 
experiments demonstrated an oncogenic role for PPARG and suggested that PAX8/PPARG1. 
may be useful in the diagnosis and treatment of thyroid carcinoma. 

15 ANIMAL MODEL 

The nuclear hormone receptor PPARG promotes adipogenesis and macrophage 
differentiation and is a primary pharmacologic target in the treatment of type II diabetes. Barak 
et al. (1999) showed that PPARG gene knockout in mice resulted in 2 independent lethal phases. 
Initially, PPARG deficiency interfered with terminal differentiation of the trophoblast and 

20 placental vascularization, leading to severe myocardial thinning and death by El 0.0. 

Supplementing PPARG null embryos with wildtype placentas via aggregation with tetraploid 
embryos corrected the cardiac defect, implicating a previously unrecognized dependence of the 
developing heart on a functional placenta. A tetraploid-rescued mutant surviving to term 
exhibited another lethal combination of pathologies, including lipodystrophy and multiple 

25 hemorrhages. These findings both confirmed and expanded the current known spectrum of 
physiologic functions regulated by PPARG. 

Kubota et al. (1999) generated homozygous PPARG-deficient mouse embryos, which 
died at 10.5 to 1 1.5 days postcoitum due to placental dysfunction. Heterozygous PPARG- 
deficient mice were protected from the development of insulin resistance due to adipocyte 
30 hypertrophy under a high-fat diet. These phenotypes were abrogated by PPARG agonist 

treatment. Heterozygous PPARG-deficient mice showed overexpression and hypersecretion of 
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leptin despite the smaller size of adipocytes and decreased fat mass, which may explain these 
phenotypes at least in part. This study revealed an unpredicted role for PPARG in high-fat diet- 
induced obesity due to adipocyte hypertrophy and insulin resistance, which requires both alleles 
of PPARG. 

5 Rosen et al. (1999) demonstrated that mice chimeric for wildtype and PPARG null cells 

showed little or no contribution of null cells to adipose tissue, whereas most other organs 
examined did not require PPARG for proper development. In vitro, the differentiation of 
embryonic stem cells into fat was shown to be dependent on PPARG gene dosage. These data 
provided direct evidence that PPARG is essential for the formation of fat. 

10 The thiazolidinedione (TZD) class of insulin-sensitizing, antidiabetic drugs interacts with 

PPAR-gamma. Miles et al. (2000) conducted metabolic studies in PPARG gene knockout mice. 
Because homozygous PPARG-null mice die in development, they studied glucose metabolism in 
mice heterozygous for the mutation. They identified no statistically significant differences in 
body weight, basal glucose, insulin, or free fatty acid levels between the wildtype and 

15 heterozygous groups. Nor was there a difference in glucose excursion between the groups of 
mice during oral glucose tolerance tests. However, insulin concentrations of the wildtype group 
were greater than those of the heterozygous deficient group, and insulin-induced increase in 
glucose disposal rate was significantly increased in the heterozygous mice. Likewise, the insulin- 
induced suppression of hepatic glucose production was significantly greater in the heterozygous 

20 mice than in wildtype mice. Taken together, these results indicated that~counterintuitively~ 

although pharmacologic activation of PPAR-gamma improves insulin sensitivity, a similar effect 
is obtained by genetically reducing the expression levels of the receptor. 

ALLELIC VARIANTS (selected examples) 

.0001 OBESITY, SEVERE [PPARG, PROl 15GLN] 

25 In 4 German subjects with severe obesity (601665), Ristow et al. (1998) identified a 

prol 15-to-gln mutation of the PPAR-gamma-2 gene. Significantly, the mutation was in the 
codon immediately adjacent to a serine-1 14 phosphorylation site. The prol 15-to-gln mutation 
occurs in exon 6, which is shared by all 3 forms of PPAR-gamma Wang et al. (1999). 

.0002 PPARG2 POLYMORPHISM C/G [PPARG, PR012ALA ] 
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OBESITY, PROTECTION AGAINST DIABETES MELLITUS, TYPE II, 
SUSCEPTIBILITY TO, INCLUDED Because the product of the PPARG gene is a nuclear 
receptor that regulates adipocyte differentiation and possibly lipid metabolism and insulin 
sensitivity. Yen et al. (1997) screened for mutations in the entire coding region of the PPARG 
5 gene in 26 diabetic Caucasians with or without obesity (601665). They found a CCG (pro)-to- 
GCG (ala) missense mutation at codon 12 (P12A). The allele frequency of the mutation varied 
from 0.12 in Caucasian Americans to 0.10 in Chinese. Beamer et al. (1998) noted that the amino 
acid position of the P12A mutation is within the domain of PPAR-ganmia-2 that enhances 
ligand-independent activation, that the substitution of alanine for proline is nonconservative, and 

10 that this amino acid change might cause a significant alteration in protein structure. To test the 
hypothesis that individuals with the variant are at increased genetic risk for obesity and/or insulin 
resistance, they performed association studies in 2 independently recruited cohorts of unrelated, 
nondiabetic, adult Caucasian subjects. They found that the P12A mutation was associated with 
higher BMI in the 2 cohorts, suggesting that the mutation may contribute to genetic susceptibility 

15 for the multifactorial disorder of obesity. 

Deeb et al. (1998) studied a polymorphism of the PPARG gene, a C-to-G variant that 
created an Hgal restriction site and predicted the substitution of alanine for proline at position 12 
in the PPARG2-specific exon B. In a group of Finnish men and women with a PPARG2 ala 
allele frequency of 0.12, they found that this allele was associated with lower fasting insulin 

20 levels (P = 0.01 1) and BMI (P = 0.027) and higher insulin sensitivity (P = 0.047). This 

association was independent of sex. The findings were verified by studies in a group of elderly 
subjects. They also studied the association of the prol2-to-ala substitution in PPARG2 with type 
2 diabetes (125853) in a group of second-generation Japanese-American (Nisei) men and women 
that included individuals with type 2 diabetes, impaired glucose tolerance, and normal controls. 

25 The ala allele was less frequent among subjects with type 2 diabetes (0.022) than among normal 
controls (0.092). The odds ratio for association of pro/pro with diabetes was significant (4.35, P 
= 0.028), whereas the frequency of the ala allele among impaired glucose tolerance subjects was 
intermediate (0.039). Deeb et al. (1998) suggested that the lower transactivation capacity of the 
ala variant of PPARG2 underlies the association of this allele with lower BMI and higher insulin 

30 sensitivity. The ala isoform may lead to less efficient stimulation of PPARG target genes and 
predispose to lower levels of adipose tissue mass accumulation, which in turn may be 
responsible for improved insulin sensitivity. 
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Altshuler et al. (2000) evaluated 16 published genetic associations to type 2 diabetes and 
related subphenotypes using a family-based design to control for population stratification, and 
replication samples to increase power. They confirmed only 1 association, that of the common 
prol2-to-ala polymorphism in PPAR-gamma with type 2 diabetes. By analyzing over 3,000 
5 individuals, they found a modest (1.25-fold) but significant (P = 0.002) increase in diabetes risk 
associated with the more common proline allele (approximately 85% frequency). Because the 
risk allele occurs at such high frequency, its modest effect translates into a large population- 
attributable risk—influencing as much as 25% of type 2 diabetes in the general population. 

.0003 CANCER OF COLON [PPARG, 1-BP DEL, 472A] 

10 In a sporadic colon cancer (1 14500) tumor, Sarraf et al. (1999) identified a somatic 

mutation in the PPARG gene, a 1-bp deletion at nucleotide 472, which resulted in a frameshift. 

.0004 CANCER OF COLON [PPARG, GLN286PRO] 

In a sporadic colon cancer (1 14500) tumor, Sarraf et al. (1999) identified a somatic 
mutation in the PPARG gene, an A-to-G transition at nucleotide 857, which resulted in a gln286- 
15 to-pro substitution. 

.0005 CANCER OF COLON [PPARG, LYS319TER] 

In a sporadic colon cancer (1 14500), Sarraf et al. (1999) identified a somatic mutation in 
the PPARG gene, an A-to-T transversion at nucleotide 955, which resulted in a lys319-to-ter 
substitution. 

20 .0006 CANCER OF COLON [PPARG, ARG288fflS] 

In a sporadic colon cancer (1 14500) tumor, Sarraf et al. (1999) identified a somatic 
mutation in the PPARG gene, a G-to-A transition at nucleotide 863, which resulted in an arg288- 
to-his substitution. 

.0007 DIABETES MELLITUS, INSULIN-RESISTANT, WITH ACANTHOSIS 
25 NIGRICANS AND HYPERTENSION [PPARG, PR0467LEU ] 

In a patient with severe insulin resistance, type 2 diabetes mellitus, and hypertension 

(604367) who had been diagnosed in her twenties, Barroso et al. (1999) detected a C-to-T 
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transition in the PPARG gene resulting in a proline-to-leucine mutation at codon 467 (P467L), 
Her son, aged 30 years, who also had a history of early-onset diabetes and hypertension, was also 
heterozygous for the P467L mutation. All other family members, including both parents of the 
proband, none of whom were known to have diabetes or hypertension, were homozygous for 
5 wildtype receptor sequence. Nonpaternity was excluded, indicating a de novo appearance of the 
mutation in the proband. 

.0008 DIABETES MELLITUS, INSULIN-RESISTANT, WITH ACANTHOSIS 
NIGRICANS AND HYPERTENSION [PPARG, VAL290MET ] 

In a 15-year-oid patient with primary amenorrhea, hirsutism, acanthosis nigricans, 
10 elevated blood pressure, and markedly elevated fasting and postprandial insulin levels (604367), 
Barroso et al. (1999) identified a G-to-A transition in the PPARG gene resulting in a valine-to- 
methionine mutation at codon 290 (V290M). By age 17 the patient had developed type 2 
diabetes and had hypertension which required treatment with beta-blockers. Her clinically 
unaffected mother and sister were both wildtype at this locus; screening of the deceased father 
15 was not possible. 

.0009 PPARG POLYMORPHISM C-T [PPARG, 161C-T ] 

Meirhaeghe et al. (1998) reported a 161C-T substitution in exon 6 of the PPARG gene. 
Since PPAR-gamma is a transcription factor implicated in adipocyte differentiation and in lipid 
and glucose metabolism, they analyzed the relationships between this genetic polymorphism and 

20 various markers of the obesity phenotype in a representative sample of 820 men and women 
living in northern France. The frequencies of the C and T alleles were 0.860 and 0.140, 
respectively. In the whole sample, no association of the polymorphism with the markers tested 
was observed, but a statistically significant interaction (P less than 0.03) existed between this 
polymorphism and body mass index (BMI) for plasma leptin levels. Obese subjects bearing at 

25 least one T allele had higher plasma leptin levels than subjects who did not. This effect existed in 
both genders, despite the higher plasma leptin levels observed in women. Thus, for a given leptin 
level, the BMI was relatively lower in obese subjects carrying at least one T allele than in obese 
CC homozygotes. 

Wang et al. (1999) studied this polymorphism in 647 Australian Caucasian patients aged 
30 65 years or less, with or without angiographically documented coronary artery disease. The 
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frequencies of the CC, CT, and TT genotypes were 69.8%, 27.7%, and 2.5%, respectively, and 
the T allele frequency 0.163. These frequencies were in Hardy- Weinberg equilibrium and not 
different between men and women. Wang et al. (1999) found that the T allele carriers (CT and 
TT genotypes) had significantly reduced coronary artery disease risk compared to the CC 
5 homozygotes, with an odds ratio of 0.457. Association with obesity (601665) was not found in 
these patients. The authors interpreted this to indicate that the PPARG gene may have a 
significant role in atherogenesis, independent of obesity and of lipid abnormalities, possibly via a 
direct local vascular wall effect. 

Using a subtractive cloning strategy to identify downstream targets of peroxisome 

10 proliferator-activated receptor-gamma (PPARG; 601487), and by screening cDNA libraries, 

Yoon et al. (2000) isolated mouse and human cDNAs encoding PGAR. The 406-amino acid, 60- 
kD human PGAR protein, which shares 75% amino acid identity with the mouse protein, is a 
member of the angiopoietin family of secreted proteins and bears highest similarity to 
angiopoietin-2 (ANGPT2; 601922). Like other members of this family, PGAR contains a 

15 predicted coiled-coil quaternary structure, and the authors hypothesized that PGAR may form 
multimeric or other higher-order structures. PGAR has a secretory signal peptide, 3 potential N- 
glycosylation sites, and 4 cysteines that may be available for intramolecular disulfide bonding. 
Northern blot analysis detected a 2-kb PGAR transcript that was highly enriched in white fat and 
placenta. In situ hybridization analysis revealed expression of mouse Pgar at low levels in most 

20 organs and connective tissue at embryonic day 13.5 (E13.5). Between E15-5 and E18.5, strongest 
expression of Pgar was in brown fat. Northern blot analysis detected elevated levels of Pgar 
expression in mouse models of obesity and diabetes. Alterations in nutrition and leptin (164160), 
administration in mice modulated Pgar expression in vivo. Yoon et al. (2000) demonstrated that 
PPARG ligand-induced transcription of PGAR follows a rapid time course typical of immediate- 

25 early genes and occurs in the absence of protein synthesis. Using a culture model system, they 
observed that induction of the PGAR transcript coincides with hormone-dependent adipocyte 
differentiation. Yoon et al. (2000) concluded that PGAR is a bona fide target of PPARG and may 
have a role in regulation of systemic lipid metabolism or glucose homeostasis. 

Kersten et al. (2000) identified mouse Pgar, which they called Fiaf (fasting-induced 
30 adipose factor), using a subtractive hybridization assay to identify PPARA (170998) target 
genes. Northern blot analysis detected expression of Fiaf in mouse white and brown adipose 
tissue, with weak expression in lung, kidney, and liver. Using a combination of wildtype, Ppara 
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mutant, and Pparg mutant mice, Kersten et al. (2000) demonstrated that mRNA expression is 
stimulated by PPARA in liver and by PPARG in white adipose tissue. Expression of Fiaf was 
upregulated in liver and white adipose tissue during fasting. Western blot analysis showed that 
the abundance of Fiaf in plasma decreased with high fat feeding, an effect directly opposite that 
5 observed with leptin. 

By radiation hybrid analysis, Yoon et al. (2000) mapped the PGAR gene to 19pl3.3. 

The DNA and protein sequences for the novel Angiopoietin-like gene are reported here 
as CuraGen Acc. No. CG57051-04. 

Similarities 

10 In a search of sequence databases, it was found, for example, that the nucleic acid 

sequence of this invention has 716 of 733 bases (97%) identical to a gbiGENBANK- 
ID:AF202636|acc:AF202636.1 mRNA from Homo sapiens (Homo sapiens angiopoietin-like 
protein PP1158 mRNA, complete cds) (Table 23). The full amino acid sequence of the protein of 
' the invention was found to have 181 of 183 amino acid residues (98%) identical to, and 182 of 

15 183 amino acid residues (99%) similar to, the 406 amino acid residue ptnriSFTREMBL- 
ACC:Q9HBV4 protein from Homo sapiens (Human) (ANGIOPOIETIN-LIKE PROTEIN 
PPl 158) (Table 24). 

A multiple sequence alignment is given in Table 26, with the protein of the invention 
being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
20 related protein sequences. Please note this sequence represents a splice form of Angiopoietin as 
indicated in positions 184L to 347G and SNPs: Q24R and G25S. 

The presence of identifiable domains in the protein disclosed herein was determined by 
searches versus domain databases such as Pfam, PROSITE, ProDom, Blocks or Prints and then 
identified by the Interpro domain accession number. Significant domains are summarized below: 

25 Model Domain seq-f seq-t hmm-f hmm-t score E-value 



fibrinogen_C 1/1 184 236.. 204 272.] 31.7 4.1e-08 



IPR002181; Fibrinogen_C 
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Fibrinogen [1], the principal protein of vertebrate blood clotting is an hexamer containing 
two sets of three different chains (alpha, beta, and gamma), linked to each other by disulfide 
bonds. The N-terminal sections of these three chains are evolutionary related and contain the 
cysteines that participate in the cross-linking of the chains. However, there is no similarity 
5 between the C-terminal part of the alpha chain and that of the beta and gamma chains. The C- 
terminal part of the beta and gamma chains forms a domain of about 270 amino-acid residues. 
As shown in the schematic representation this domain contains four conserved cysteines 
involved in two disulfide bonds. (SEQ ID NO: 126) 

xxxxCxxxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxxCxxxxxxxxxxx 

10 I I II 



'C: conserved cysteine involved in a disulfide bond. 

Such a domain has been recently found in other proteins which are listed below. 

15 Two sea cucumber fibrinogen-like proteins (FReP-A and FReP-B). These are proteins, of 

about 260 amino acids, which have a fibrinogen beta/gamma C-terminal domain. 

In the C-terminus of Drosophila protein scabrous (gene sea). Scabrous is involved in the 
regulation of neurogenesis in Drosophila and may encode a lateral inhibitor of R8 cells 
differentiation. In the C-terminus of a mammalian T-cell specific protein of unknown function. 

20 In the C-terminus of a human protein of unknown function which is encoded on the 

opposite strand of the steroid 21-hydroxylase/complement component C4 gene locus. 

The function of this domain is not yet known, but it has been suggested that it could be 
involved in protein-protein interactions. 

This indicates that the sequence of the invention has properties similar to those of other 
25 proteins known to contain this/these domain(s) and similar to the properties of these domains. 

Chromosomal information: 

The Angiopoietin-like gene disclosed in this invention maps to chromosome 19pl3.3. 
This assignment was made using mapping information associated with genomic clones, public 
genes and ESTs sharing sequence identity with the disclosed sequence and CuraGen 
30 Corporation's Electronic Northern bioinformatic tool. 
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Tissue expression 

The Angiopoietin-like gene disclosed in this invention is expressed in at least the 
following tissues: Adipose, Heart, Aorta, Coronary Artery, Umbilical Vein, Adrenal 
Gland/Suprarenal gland. Pancreas, Islets of Langerhans, Thyroid, Pineal Gland, Parotid Salivary 
5 glands. Liver, Small Intestine, Duodenum, Colon, Bone Marrow, Lymph node. Bone, Cartilage, 
Synovium/Synovial membrane. Skeletal Muscle, Brain, Thalamus, Pituitary Gland, Amygdala, 
Hippocampus, Spinal Chord, Mammary gland/Breast, Ovary, Placenta, Uterus, Vulva, Prostate, 
Testis, Lung, Kidney, Retina, Skin, Foreskin. Expression information was derived from the 
tissue sources of the sequences that were included in the derivation of the sequence of CuraGen 
10 Acc.No.CG57051-04. 

Cellular Localization and Sorting 

The PSORT, SignalP and hydropathy profile for the Angiopoietin-like protein are shown 
in Table 27. Although PSORT suggests that the Angiopoietin-like protein may be localized in 
the cytoplasm, the protein of CuraGen Acc. No. CG57051-04 predicted here is similar to the 
15 Fibrinogen family, some members of which are secreted. Therefore it is likely that this novel 
Angiopoietin-like protein is localized to the same sub-cellular compartment. 

Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Angiopoietin-like protein includes the 
nucleic acid whose sequence is provided in Figure 20, or a fragment thereof. The invention also 

20 includes a mutant or variant nucleic acid any of whose bases may be changed from the 
corresponding base shown in Fig. 1 while still encoding a protein that maintains its 
Angiopoietin-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to the 
sequence of CuraGen Acc. No. CG57051-04, including nucleic acid fragments that are 

25 complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of non-limiting example, modified 
bases, and nucleic acids whose sugar phosphate backbones are modified or derivatized. These 
modifications are carried out at least in part to enhance the chemical stability of the modified 

30 nucleic acid, such that they may be used, for example, as antisense binding nucleic acids in 
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therapeutic applications in a subject. In the mutant or variant nucleic acids, and their 
complements, up to about 3% of the bases may be so changed. 

The novel protein of the invention includes the Angiopoietin-like protein whose sequence 
is provided in Figure 20. The invention also includes a mutant or variant protein any of whose 
5 residues may be changed from the corresponding residue shown in Figure 20 while still encoding 
a protein that maintains its Angiopoietin-like activities and physiological functions, or a 
functional fragment thereof. In the mutant or variant protein, up to about 2% of the amino acid 
residues may be so changed. 

Chimeric and Fusion Proteins 

10 The present invention includes chimeric or fusion proteins of the Angiopoietin-like 

protein, in which the Angiopoietin-like protein of the present invention is joined to a second 
polypeptide or protein that is not substantially homologous to the present novel protein. The 
second polypeptide can be fused to either the amino-terminus or carboxyl-terminus of the present 
CG57051-04 polypeptide. In certain embodiments a third nonhomologous polypeptide or protein 

15 may also be fused to the novel Angiopoietin-like protein such that the second nonhomologous 
polypeptide or protein is joined at the amino terminus, and the third nonhomologous polypeptide 
or protein is joined at the carboxyl terminus, of the CG57051-04 polypeptide. Examples of 
nonhomologous sequences that may be incorporated as either a second or third polypeptide or 
protein include glutathione S-transferase, a heterologous signal sequence fused at the amino 

20 terminus of the Angiopoietin-like protein, an immunoglobulin sequence or domain, a serum 
protein or domain thereof (such as a serum albumin), an antigenic epitope, and a specificity 
motif such as (His)6. 

The invention further includes nucleic acids encoding any of the chimeric or fusion 
proteins described in the preceding paragraph. 

25 Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
(Fab)2 or single chain FV constructs, that bind immunospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
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peptides and polypeptides that are fused to any carrier particle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, cellular localization, and map 
5 location for the protein and nucleic acid disclosed herein suggest that this Angiopoietin-like 
protein may have important structural and/or physiological functions characteristic of the 
Fibrinogen family. Therefore, the nucleic acids and proteins of the invention are useful in 
potential diagnostic and therapeutic applications and as a research tool. These include serving as 
a specific or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the 
10 presence or amount of the nucleic acid or the protein are to be assessed. These also include 
potential therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small 
molecule drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), (iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), (v) an agent 
promoting tissue regeneration in vitro and in vivo, and (vi) a biological defense weapon. 

15 The nucleic acids and proteins of the invention have applications in the diagnosis and/or 

treatment of various diseases and disorders. For example, the compositions of the present 
invention will have efficacy for the treatment of patients suffering from: type n diabetes, obesity, 
colon cancer, diabetes mellitus, insulin-resistant, with acanthosis nigricans and hypertension, 3- 
methylglutaconicaciduria, type III; Cone-rod retinal dystrophy-2;DNA ligase I deficiency; 

20 Glutaricaciduria, type IIB Liposarcoma; Myotonic dystrophy as well as other diseases, disorders 
and conditions. 

These materials are further useful in the generation of antibodies that bind 
immunospecifically to the novel substances of the invention for use in diagnostic and/or 
therapeutic methods. 

25 

Table 23. BLASTN search using CuraGen Acc. No. CG57051-04. 

>gb:GENBANK-ID:AF202636|acc:AF202636.1 Homo sapiens angiopoietin-like protein 
PP1158 mRNA, complete cds - Homo sapiens, 1943 bp. 
30 Length = 1943 (seq id no:79) 

Plus Strand HSPs: 
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Score = 3468 (520.3 bits). Expect = 7.8e-202, Sum P(2) = 7.8e-202 
Identities = 716/733 (97%), Positives = 716/733 (97%), Strand = Plus / Plus 
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GCGGATCCTCACACGACTGTGATCCGATTCTTTCCAGCGGCTTCTGCAACCAAGCGGGTC 

llllllllllllllllllilllllllllllllllllllllllllllllllllllllllll 
GCGGATCCTCACACGACTGTGATCCGATTCTTTCCAGCGGCTTCTGCAACCAAGCGGGTC 



61 



79 



121 



TTACCCCCGGTCCTCCGCGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCGAGAGT 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
TTACCCCCGGTCCTCCGCGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCGAGAGT 139 

CCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGACGGCCGGGGCA 181 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIMI 

CCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGACGGCCGGGGCA 199 

GCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCT-AGATCTGGACCCGTGCA 240 

lllllllllllllllllllllllllllllllillllllllll II llllllllllll 

GCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCTCAGGGC-GGACCCGTGCA 258 

GTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACTCCT 300 

lllllllllllllllllllllllllillllllllllllllllllllllllllllllllll 
GTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACTCCT 318 

GCAGCTCGGCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCT 360 

Mllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GCAGCTCGGCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCT 37 8 

GGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCT 42 0 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCT 438 

CCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACT 480 

Mllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTC ACAGCCTGC AGACACAACT 498 

CAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACAAGGTGGCCCAGCAGCAGCGGCA 540 

IIIIIM llllllllllllllllllllllllllllllllllllillllllllllll Mil 
CAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACAAGGTGGCCCAGCAGCAGCGGCA 558 

CCTGGAGAAGC AGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCA 600 

IIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIMIIIIIIIIIIMIMMMIIIMM 

CCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCA 618 
601 CAAGCACCTAGACCATGAGGTGGCCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGC 660 

Mill 1 1 III III I MM I II lllll lllll llllllllllll IIIIIM III III nil 

CAAGCACCTAGACCATGAGGTGGCCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAG ATGGC 678 



CCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACCGAG-GCTGGTGGTTTGGCA 

IMIIMIMIIMIIIIIIIMIMIIIMMIIIIIIIII II M III II 

CCAGCCAGTTGACCCGGCTCACAATGTCAGGCGCCTGCACCGGCTGCCCAGGGATTGCCA 



719 



738 



III lllll 
AGCTGTTCCA 7 50 



60 



65 



70 
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Score = 1 182 (177.3 bits). Expect = 7.8e-202, Sum P(2) = 7.8e-202 
Identities = 242/245 (98%), Positives = 242/245 (98%), Strand = Plus / Plus 
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751 



GCCTGCACCG-AGGCTGGTGGTTTGGCACCTGCAGCCATTCCAACCTCAACGGCCAGTAC 

MM I I I lllllllllllllllllllllllllllllllllllllllllllllllll 
1203 GCCT-CTCTGGAGGCTGGTGGTTTGGCACCTGCAGCCATTCCAACCTCAACGGCCAGTAC 1261 



752 



811 



TTCCGCTCCATCCCACAGCAGCGGCAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGG 

II lllll II MM II MM II lllll I II lllll lllll I llllllllll IIIIIM III 

1262 TTCCGCTCCATCCCACAGCAGCGGCAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGG 1321 



812 



871 



CGGGGCCGCTACTACCCGCTGCAGGCCACCACCATGTTGATCCAGCCCATGGCAGCAGAG 

IIIMIIMMIIIIIIMIIIIMM IIIIIM IIIIIMIMIIIIIIM MM MM 

1322 CGGGGCCGCTACTACCCGCTGCAGGCCACCACCATGTTGATCCAGCCCATGGCAGCAGAG 1381 



872 



931 



GCAGCCTCCTAGCGTCCTGGCTGGGCCTGGTCCCAGGCCCACGAAAGACGGTGACTCTTG 

llllllllllllllllllllllllllllllllll llllllllllllllllllllllllll 

1382 GCAGCCTCCTAGCGTCCTGGCTGGGCCTGGTCCCAGGCCCACGAAAGACGGTGACTCTTG 1441 

100 



Query: 932 GCTCTG 937 

mill 

Sbjct: 1442 GCTCTG 1447 



Table 24. BLASTP search using the protein of CuraGen Acc- No. CG57051-04. 
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>ptnr:SPTREMBL-ACC:Q9HBV4 ANGIOPOIETIN-LIKE PROTEIN PPl 158 - Homo sapiens 
(Human), 406 aa. (Seq id no:80) 
Length = 406 



45 



Score = 929 (327.0 bits). Expect = 4.4e-126, Sum P(2) = 4.4e-126 
Identities = 181/183 (98%), Positives = 182/183 (99%) 
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RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 

I M I I II i I I I I I I I I I I II II I II I I I I I I I II I I III I I I I II II II I ll.l I I I II I I 
RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 120 



HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 

MIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIMIIIIIII 

HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 



III 



180 



180 



4.4e-126 



60/62 (96%), Positives = 60/62 (96%) 



240 



LHRGWWFGTCSHSNLNGQYFRSIPQQRQKLKKGIFWKTWRGRYYPLQATTMLIQPMAAEA 

I IIIIIIIIIMIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

LSGGWWFGTCSHSNLNGQYFRSIPQQRQKLKKGIFWKTWRGRYYPLQATTMLIQPMAAEA 404 



II 



Score = 49 (17.2 bits). Expect = 2.4e-33, Sum P(2) = 2.4e-33 
Identities = 14/40 (35%), Positives = 20/40 (50%) 
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MSGAPTAGAALMLCAATAVLLSARSGPVQSKSPRFASWDE 

^ I II H M I M + I I I^HI^ 

LGGEDTA-YSLQLTAPVAGQLGATTVPPSGLSVPFSTWDQ 
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50 Table 25. BLASTN identity search of CuraGen Corporation's Human SeqCalUng database using 

CuraGen Acc. No. CG57051-04. 



>s3aq:230527544 , 2394 bp. (seq id no:81) 
Length = 2394 

55 

Minus Strand HSPs: 

Score = 3468 (520.3 bits). Expect = 1.2e-202, Sum P(2) = 1.2e-202 
Identities = 716/733 (97%), Positives = 716/733 (97%), Strand = Minus / Plus 

60 

Query: 734 TGGAATGGCTGCAGGTGCCAAACCACCAGCCTC-GGTGCAGGCGGCTGACATTGTGAGCC 676 

Hill III I II III II II I MIIIIIIIIIIIMIIIIIIIIIII 

Sbjct: 1645 TGGAACAGCTCCTGG CAATCCCTGGGCAGCCGGTGCAGGCGGCTGACATTGTGAGCC 1701 
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Query: 
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GGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCTTTCTTCGGGCAGGCTTGGCCACCTCA 616 

lllllllllllillllllllllllllllllllllllllllllllllllllllllllllll 
GGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCTTTCTTCGGGCAGGCTTGGCCACCTCA 1761 

TGGTCTAGGTGCTTGTGGTCCAGGAGGCC AAACTGGCTTTGCAGATGCTGAATTCGCAGG 556 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIII 
TGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAACTGGCTTTGC AGATGCTGAATTCGCAGG 1821 

TGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGTGGAAGAGTTGCTGGATCCTG 496 

lllllllllllillllllllllllllllllllllllllllllllllllllllllllllll 

TGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGTGGAAGAGTTGCTGGATCCTG 1881 

CTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGACCTCAGGGTCCACCCGGCTC 436 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
CTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGACCTCAGGGTCCACCCGGCTC 1941 

TCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCTGACAGGCGGACCCGCACGCG 376 

IIMIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

TCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCTGACAGGCGGACCCGCACGCG 2001 

CTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGCGCTCCGCGTGTTCGCGCAGC 316 

i I I i I i I i I I I I i I I I I I I i i I I I I M I i i I I I I I I I I I i I I I I i I I I i I I I I I I I I I I I 

CTCAGGCGCCGCtcCAGCGCGCTCAGCTGACTGCGG 2061 

CCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTCGTCCCAGGACGCAAAG 256 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTCGTCCCAGGACGCAAAG 2121 

CGCGGCGACTTGGACTGCACGGGTCCAGATCT-AGCGCTCAGTAGCACGGCGGTGGCGGC 197 

llllllllllllllllllllllllll I II lllllllllllllllllllllllllll 
CGCGGCGACTTGGACTGCACGGGTCC -GCCCTGAGCGCTCAGTAGCACGGCGGTGGCGGC 2180 

GC AGAGCATC AGGGCTGCCCCGGCCGTCGGAGCACCGCTCATCCTCTTAGGTAGCCTGGG 137 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMMII 

GCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCACCGCTCATCCTCTTAGGTAGCCTGGG 2240 
AGCGGGGATTCGGGGACTCTCGGGGACGTTGGGGTTCCAGGTGCGAGGACTGGAGACGCG 77 

IIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIMIIIIIIIIIIIIMIIMIIII ' 

AGCGGGGATTCGGGGACTCTCGGGGACGTTGGGGTTCCAGGTGCGAGGACTGGAGACGCG 23 00 
GAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAAGCCGCTGGAAAGAATCGGATCACAGT 1 7 

11 1 INI I III II II I II III 1 1 llllll II Mil nil IIMIII 1 1 II IN II III II 

GAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAAGCCGCTGGAAAGAATCGGATCACAGT 2360 



II 



llllllll 



1.2e-202 (SBQ XD NOs127) 



878 



CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 1007 



GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIII 



818 



GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 758 

llllllllllllllllllllllllllllllllllllllllllllllillllllMIIIII 

GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 1127 
GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCG-GT 699 

IIIIIIIIIIIIIIIIIMMIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIII I 

GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACC AGCCTCCAGA 1187 



MM 
AGGC 



693 



1192 



>s3aq:218296061 , 1862 bp. (seq id no:82) 
Length = 1862 



102 



;l. ill iiji F 'M! '^-Kiiru ir-}, .... kf O i:? 



Minus Strand HSPs: 

Score = 3444 (516.7 bits), Expect = 1.8e-201, Sum P(2) = 1.8e-201 
Identities = 714/733 (97%), Positives = 714/733 (97%), Strand = Minus / Plus 





Query : 


734 


10 


Sbjct: 


1133 




Query: 


675 




Sbjct: 


1190 


15 


Query : 


615 




Sbjct: 


1250 


20 


Query: 


555 




Sbjct : 


1310 




Query: 


495 


25 


Sbjct: 


1370 




Query: 


435 


30 


Sbjct: 


1430 




Query : 


375 




Sbjct : 


1490 


35 


Query: 


315 




Sbjct: 


1549 


40 


Query: 


255 




Sbjct: 


1609 




Query: 


196 


45 


Sbjct: 


1668 




Query: 


136 


50 


Sbjct: 


1728 




Query: 


76 




Sbjct: 


1788 


55 


Query: 


16 




Sbjct: 


1848 



TGGAATGGCTGCAGGTGCCAAACCACCAGCCTC-GGTGCAGGCGGCTGACATTGTGAGCC 676 

Mill III I II III II II I llllllllllllllllllllllllll 

TGGAACAGCTCCTGG CAATCCCTGGGCAGCCGGTGCAGGCGGCTGACATTGTGAGCC 1189 

GGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCTTTCTTCGGGCAGGCTTGGCCACCTCA 616 

lllllll II III! II III II II I III III I Illllllll III llllllll Illllllll! 
GGGTC AACTGGCTGGGCC ATCTCGGGC AGCCTCTTTCTTCGGGC AGGCTTGGCCACCTCA 1249 

TGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAACTGGCTTTGC AGATGCTGAATTCGCAGG 556 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIII II III llllllll llllllllll 

TGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAACTGGCTTTGCAGATGCTGAATTCGCAGG 13 09 

TGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGTGGAAGAGTTGCTGGATCCTG 496 

lllllllllllll lllllllllllllllllllllllllllllllllllllllllllllll 

TGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGTGGAAGAGTTGCTGGATCCTG 1369 



CTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGACCTCAGGGTCCACCCGGCTC 

lllilllllillllllllllllllllllllllllllllllll lllllllllllllllll 

CTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGACCCCAGGGTCCACCCGGCTC 



436 



1429 



376 



TCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCTGACAGGCGGACCCGCACGCG 

IIIIIIIIIMMIIIllllllllllllllllllllllllllllllllll llllllllll 
TCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCTGACAGGCGGACCCGCACGCG 1489 

CTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGCGCTCCGCGTGTTCGCGCAGC 316 

IIMIIIII IIMIIIIIIIIMIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIII 

CTCAGGCGC - GCTCCAGCGCGCTCAGCTGACTGCGGGTGCGCTCCGCGTGTTCGCGCAGC 1548 

CCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTCGTCCCAGGACGCAAAG 256 

IIIIIIMIIIIIIIIIIIIII llllllllllllllllllllllllllll llllllllll 

CCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTCGTCCCAGGACGCAAAG 1608 

CGCGGCGACTTGGACTGCACGGGTCCAGATCT-AGCGCTCAGTAGCACGGCGGTGGCGGC 197 

llllllllllllllllllllllllll I II IIIIIIIIIIIMIIIIIIIIIIIIII 
CGCGGCGACTTGGACTGCACGGGTCC -GCCCTGAGCGCTCAGTAGCACGGCGGTGGCGGC 1667 

GCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCACCGCTCATCCTCTTAGGTAGCCTGGG 137 

IIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

GO AGAGC ATC AGGGCTGCCCCGGCCGTCGGAGCACCGCTCATCCTCTTAGGTAGCCTGGG 1727 

AGCGGGGATTCGGGGACTCTCGGGGACGTTGGGGTTCCAGGTGCGAGGACTGGAGACGCG 77 

lllllllllllll IIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIII llllllllll 

AGCGGGGATTCGGGGACTCTCGGGGACGTTGGGGTTCCAGGTGCGAGGACTGGAGACGCG 1787 



GAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAAGCCGCTGGAAAGAATCGGATCACAGT 

lllllll iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiriiiiiiiiii llllllllll 



CGTGTGAGGATCCGC 2 

lllllllllllllll 
CGTGTGAGGATCCGC 1862 



17 



Score = 1182 (177.3 bits). Expect = 1.8e-201, Sum P(2) = 1.8e-201 (seq id noi128) 
60 Identities = 242/245 (98%), Positives = 242/245 (98%), Strand = Minus / Plus 





Query: 


937 


65 


Sbjct: 


436 






Query: 


877 




Sbjct: 


496 


70 


Query: 


817 




Sbjct: 


556 




Query : 


757 



CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 878 

IIIMMIIIIIIIIMIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIMMIII 

CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 495 

GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 818 

ri I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 555 

GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 758 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 615 



103 



.:t W Cii H ihi; JiS ifni 1! 'm^' irf ill iirf 



IIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII I 

Sb j c t : 616 GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCC AAACCACCAGCCTCCAGA 67 5 
Query: 698 GCAGGC 693 

I MM 

Sbjct: 676 G-AGGC 680 



>s3aq:2 17940431 Category E: ,530 bp. (seq id no:B3) 
10 Length = 530 

Minus Strand HSPs: 

Score = 1800 (270.1 bits), Expect = 1.2e-75, P = 1.2e-75 
15 Identities = 384/403 (95%), Positives = 384/403 (95%), Strand = Minus / Plus 



20 



25 



30 



35 



40 



45 



50 



Query: 
Sbjct : 
Query: 
Sbjct: 

Query: 
Sbjct : 
Query: 
Sbjct: 
Query : 
Sbjct: 
Query : 
Sbjct : 
Query: 
Sbjct: 



631 AGGCTTGGCCACC-TCATGGTCTAGGTG-CTT-GTGGTCCAG-GAGGCCAAACTGGCTTT 57 6 

II I III I II III I II I III II II III lllllll llllllll 
128 AGCCCTGGTCCCCGTCA-G-TCAATGTGACTGAGTCCGCCATTGAGGCCAGTCTGGCTTT 185 



575 



516 



GCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGT 

lllllll IIIIIIMIIIIIIIIIIIIIIIIIIMIMMIMIIMMMIIIIIIIM 

186 GCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGT 245 



515 



456 



GG7VAGAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGA 

MMIIIIIIIIIIIMIIIIIMIIIIIMIIIIIIIIIII mill Mill II IIIM 

246 GGAAGAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGA 30 5 



455 



396 



CCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCT 

Jllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

306 CCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCT 365 

395 GACAGGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGC 336 

lllllllllllllllllllllllllllllll I llllllllllllllllllllllllll 
3 66 GACAGGCGGACCCGCACGCGCTCAGGCGCCGTTTCAGCGCGCTCAGCTGACTGCGGGTGC 425 

335 GCTCCGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCC AGGACATTCA 27 6 

IIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIII 

426 GCTCCGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCA 485 

275 TCTCGTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTC 231 

lllllllllllllllllllllllllllllllllllllllllllll 

486 TCTCGTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTC 530 



(SEQ ID NO: 84) 



>s3aq:230121563 , 788 bp. 

Length = 788 

Minus Strand HSPs: 



Score = 1182 (177.3 bits). Expect = 6.4e-48, P = 6.4e-48 

Identities = 242/245 (98%), Positives = 242/245 (98%), Strand = Minus / Plus 



55 



60 



65 



70 



Query : 
Sbjct : 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query : 
Sbjct: 
Query: 
Sbjct: 



937 CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 878 

IIIIIIIMIIIMIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIM 

171 CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 23 0 



877 



818 



GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 

Mllliillllllllllllillllllllllllllllllllllllllllllllllllllll 

231 GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 290 



817 



758 



GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIII 

291 GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 350 



7 57 GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCG-GT 

llllllllllllllllllllllllllllllllllllllllllllllllllllllll I 
351 GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCCAGA 

698 GCAGGC 693 

I MM 

411 G-AGGC 415 



699 



410 



104 



>s3aq:217939973 
Length = 631 



631 bp. (SEQ ID NO: 85) 



Minus Strand HSPs: 



10 



Score = 1 182 (177.3 bits). Expect = 8.0e-48, P = 8.0e-48 
Identities = 242/245 (98%), Positives = 242/245 (98%), Strand = 



Minus / Plus 





Queiry: 


937 




Sbjct: 


105 


15 


Query: 


877 




Sbjct : 


165 


20 


Query: 


817 






Sbjct: 


225 




Query: 


757 


25 


Sbjct: 


285 




Query: 


698 




Sbjct : 


345 



CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 878 

llllllllllllllllllllllllllllllllllllllllllllllllilllllllllll 

CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 164 

GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 818 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 224 

817 GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 758 

IIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIII 

GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 284 
GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCG-GT 699 

lllllllilllllllllllllllllllllllllllllllllllllllllMIIIII I 

GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACC ACC AGCCTCCAGA 344 



II 



30 



35 



>s3aq:217939964 , 
Length = 328 

Plus Strand HSPs: 



328 bp. (SEQ ID NO: 86) 



Score = 777 (1 16.6 bits). Expect = 3-0e-29, P = 3.0e-29 

Identities = 157/159 (98%), Positives = 157/159 (98%), Strand = Plus / Plus 



40 


Query: 


779 




Sbjct: 


1 


45 


Query : 


839 


Sbjct: 


61 




Query : 


899 


50 


Sbjct: 


121 



AAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAGGCC 838 

II II MINI I MINI I llllllllllllllilllllllll I IIIIIM I mil II II 

AAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAGGCC 6 0 

ACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAGCGTCCTGGCTGGGCC 898 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
ACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAGCGTCCTGGCTGGGCC 120 

TGGTCCCAGGCCCACGAAAGACGGTGACTCTTGGCTCTG 937 

llllllllllll MIIIIIIIIMIIIIIIIIIIII I 
TGGTCCCAGGCCAACGAAAGACGGTGACTCTTGGCTCCG 159 



Table 26. ClustalW alignment of CG57051-04 protein with related proteins. 
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CG57051-04 I. 

CG57051-02 ij 

Q9HBV4 11 

CG57051-03 3 

CG57051-04 
CG57051-02 
Q9HBV4 
CG57051-03 





CG57051-04 
CG57051-02 
Q9HBV4 
CG57051-03 

CG57051-04 
CG57051-02 
Q9HBV4 
CG57051-03 

CG57051-04 
CG57051-02 
Q9HBV4 
CG57051-03 

CG57051-04 
CG57051-02 
Q9HBV4 
CG57051-03 



P V Q S KS P R FA S TO E M H V L AHG L L QL G Q GL RE H AE 
SP V Q S KS P R F A S VvT- E M N V L AHG L L QL G Q GL RE H AE 
:tP V Q S KS P R F a S To E M H V L AHG L L QL G Q GL re H AE 
"PVQSKSPRFA S V-iT- E M N V L AHG L L QL G Q GL RE H AE 



RTRSQL S ALERRLSACGSACOGTEGSTDLPLAPESRVDPE VLH S LQTQLfCAQN S R I QQL F 
RTRSQL S ALERRLSACGSACQGTEGSTDLPLAPESRVDPE VLH S LQTQLKAQH S R I QQL F 
RTRSQL S ALERRLSACGSACQGTEGSTDLPLAPESRVDPE VLH S LQTQLICAQN S R I OQL F 
RTRSQL S ALERRLSACGSACQGTEGSTDLPLAPESRVDPE VLHSLOTOLKAONSRIOQLF 



H KV AQQ Q R HL E KQHL R I QHL Q S Q FGL LD H KHL DHE V A K P A R RK R L P E M AQP V E> P A HN V S R 
H KV AQQ Q R HL E KQHL R I QHL Q S Q FGL L D H ICHL DHE V A K P A R RK R L P E M AQP V D P A HIT V S R 
HKVAQQQ RHLEKQHL R I QHLQSQFGLLDH KHLDHE VAKP A RRKRLPEMAQP VDP AHNV S R 
HKVAQQQ RHLEKQHL R I QHLQSQFGLLDH KHLDHE VAKP A RRKRLPEMAQP VDP AHNV S R 



-PRD CQEL FQVGERQSGLFE IQPQG S PPFLVNCKMT S D 

_ _ _ H 



GGV-O- V I Q RRHD G S F N RP 

^gw v i q rrhd g s v d f n rp 

::;gvv^-v-i-qr-r-hdg-s-v-df-h-f.p 



™ A YICA G F GDP H GE F V.?L GL E K -/H S I TGD R N S RL A VQL R D WD GN A E L L Q F S VH L GG E D T A Y 
m A YICA G F GDP H GE F V7L GL E K^-Ti S I TGD R N S RL AVQL R DWD GN A E L L Q F S VH L GG E D T AY 
i.^/E A YI-Lf^ G F GDP H GE FTO GL E KVH S I TGD R N S RL A VQL R D V^D iBbT A E L L Q F S VH L GG E D T A Y 



; L QL T A P V AGQ L GAT T VP P S GL S VP F S T VT:'QDHDL RR D KN C AK S L 
:LQLTAP VAGQLGATTVPPSGLSVPFSTV/T'QDHDLRRDKN caksl 
: L QL T A P V AGQ L GAT T VP P S GL S VP Fi'JtV./DQDHDL RR D KN C AK S L 



SAPS VAQRPDHVP S P 



CG57051-04 a 

CG57051-02 LTPAg 

Q9HBV4 

CG57051-03 

CG57051-04 
CG57051-02 
Q9HBV4 
CG57051-03 



VAVFGTC SHS NLNGQYFR3 I PQQRQKLKJCGI FV.TCT \TOG R YYP L QATTML I QPMAA 
GVAVFGTC S HS N LNGQYFRS I PQQ RQI^XKiCG LFV/^T WRG R YYP L QATTML I QPMAA 
iGGWFGTCSHSNLNGQYFRS I PQQ RQICLI^CG I FWKT V/RG R YYP L QATTML I QPMAA 
N GGVAVFGTC S HS N LNGQYFRS I PQQRQKLKICG I FV.^CT V.?RG R YYP L QATTML I OPMAA 



Information for the ClustalW proteins: 



Accno 

CG57051-04 (SEQI0NO:51) 
CG5705 1 -02 (SEQ id N0:55) 
Q9HBV4 (SEQIDNO:80) 

CG57051-03 (SEQroNO:57) 



Common Name Length 

novel Angiopoietin-like protein 242 

Angiopoietin Related protein / PPAR-gamma 386 

ANGIOPOIETIN-LIKE PROTEIN PP1158. 406 

Angiopoietin-like protein- isoform 3 368 



In the alignment shown above, black outHned amino acid residues indicate residues 
identically conserved between sequences (i.e., residues that may be required to preserve 
5 structural or functional properties); amino acid residues with a gray background are similar to 
one another between sequences, possessing comparable physical and/or chemical properties 

106 



JL O i! Ji :P' Fi ifeii ifiiiJ «. il J fi! ik^' ii-f P. li ^ 



without altering protein structure or function (e.g. the group L,V, I, and M may be considered 
similar); and amino acid residues with a white background are neither conserved nor similar 
between sequences. 

Table 27. PSORT, SignalP and hydropathy results for CuraGen Acc. No. CG57051- 

5 04. 

endoplasmic reticulum (membrane) — Certainty=0.8200( Affirmative) < suco 
plasma membrane — Certainty=0.1900( Affirmative) < suco 
microbody (peroxisome) — Certainty=0. 1701 (Affirmative) < suco 
endoplasmic reticulum (lumen) — Certainty=0. 1000( Affirmative) < suco 

10 

INTEGRAL Likelihood = -4.04 Transmembrane 7 - 23 ( 4 - 25) 

Seems to be a Type lb (Nexo Ccyt) membrane protein 
Is the sequence a signal peptide? 
15 # Measure Position Value Cutoff Conclusion 
max. C 31 0.427 0.37 YES 
max. Y 31 0.473 0.34 YES 
max. S 8 0.952 0.88 YES 
; means 1-30 0.738 0.48 YES 
.20 # Most likely cleavage site between pos. 30 and 3 1 : VQS-KS 



Hydrophobic i -by Plot for CG57851-04 with a uindou of 19 




8 50 lee 150 200 250 

Amino Acid Number 
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SECP 16 

A SECP 16 nucleic acid and polypeptide according to the invention were obtained by 
exon linking and include the nucleic acid sequence (SEQ ID NO:52) and encoded polypeptide 
sequence (SEQ ID NO:53) of clone CG57051-05 directed toward novel Angiopoietin-like 
5 proteins and nucleic acids encoding them. Figure 21 illustrates the nucleic acid sequence and 
amino acid sequences respectively. This clone includes a nucleotide sequence (SEQ ID NO:52) 
of 1239 bp. The nucleotide sequence includes an open reading frame (ORF) beginning with an 
ATG initiation codon at nucleotides 80-82 and ending with a TAG stop codon at nucleotides 
1 184-1 186. Putative untranslated regions, if any, are found upstream from the initiation codon 

10 and downstream from the termination codon. The encoded protein having 368 amino acid 
residues is presented using the one-letter code in Figure 21. The protein encoded by clone 
CG57051-05 is predicted by the PSORT program to be located extracellularly with a certainty of 
0.7332 and has a signal peptide (see Table 28 below). The PGR product derived by exon linking, 
covering the entire open reading frame, was cloned into the pCR2.1 vector from Invitrogen to 

15 provide clone 157544::CG50847-01.891637.M13 and clone 157544::CG50847-01.891637.O5. 

Similarities 

In a search of sequence databases, it was found, for example, that the nucleic acid 
sequence of this invention has 867 of 1064 bases (81%) identical to a gbiGENBANK- 
ID:AF202636|acc:AF202636.1 mRNA from Homo sapiens (Homo sapiens angiopoietin-Iike 
20 protein PPl 158 mRNA, complete cds) (See Table 24). The full amino acid sequence of the 
protein of the invention was found to have 185 of 192 amino acid residues (96%) identical to, 
and 185 of 192 amino acid residues (96%) similar to, the 406 amino acid residue 
ptnr:SPTREMBL-ACC:Q9HBV4 protein from Homo sapiens (Human) (ANGIOPOIETIN-LIKE 
PROTEIN PPl 158) (See Table 25). 

25 A multiple sequence alignment is given in Table 27, with the protein of the invention 

being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
related protein sequences. Please note this sequence represents a splice form of Angiopoietin, 
missing exon 4, as indicated in positions 183 to 221 and with SNPs: V156G, A157G, T266M. 
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The presence of identifiable domains in the protein disclosed herein was determined by 
searches versus domain databases such as Pfam, PROSITE, ProDom, Blocks or Prints and then 
identified by the Interpro domain accession number. Significant domains are summarized below: 

Model Domain seq-f seq-t hmm-f hmm-t score E-value 



fibrinogen_C 1/2 184 246 47 123 .. 98.2 4e-27 
fibrinogen^C 2/2 288 362.. 178 272.] 67.0 3.4e-18 

1PR002181; (Fibrinogen_C) 

10 Fibrinogen, the principal protein of vertebrate blood clotting is an hexamer containing 

two sets of three different chains (alpha, beta, and gamma), linked to each other by disulfide 
bonds. The N-terminal sections of these three chains are evolutionary related and contain the 
cysteines that participate in the cross-linking of the chains. However, there is no similarity 
between the C-terminal part of the alpha chain and that of the beta and gamma chains. The C- 

15 terminal part of the beta and gamma chains forms a domain of about 270 amino-acid residues. 
As shown in the schematic representation this domain contains four conserved cysteines 
involved in two disulfide bonds. 

20 xxxxCxxxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxxCxxxxxxxxxxx 

II II 

+ + + + 

'C: conserved cysteine involved in a disulfide bond. 

25 (SEQIDNO:126) 

Such a domain has been recently found in other proteins which are listed below: 

1) Two sea cucumber fibrinogen-like proteins (FReP-A and FReP-B). These are proteins, 
of about 260 amino acids, which have a fibrinogen beta/gamma C-terminal domain. 
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2) In the C-terminus of Drosophila protein scabrous (gene sea). Scabrous is involved in 
the regulation of neurogenesis in Drosophila and may encode a lateral inhibitor of R8 cells 
differentiation. 

3) In the C-terminus of a mammalian T-cell specific protein of unknown function. 

5 . 4) In the C-terminus of a human protein of unknown function which is encoded on the 

opposite strand of the steroid 21 -hydroxy lase/complement component C4 gene locus. 

The function of this domain is not yet known, but it has been suggested that it could be 
involved in protein-protein interactions. 

10 This indicates that the sequence of the invention has properties similar to those of other 

proteins known to contain this/these domain(s) and similar to the properties of these domains. 

Chromosomal information: 

The Angiopoietin-like gene disclosed in this invention maps to chromosome 19pl3.3. 
This assignment was made using mapping information associated with genomic clones, public 
15 genes and ESTs sharing sequence identity with the disclosed sequence and CuraGen 
Corporation's Electronic Northern bioinformatic tool. 

Tissue expression 

The Angiopoietin-like gene disclosed in this invention is expressed in at least the 
following tissues: Adipose, Liver, Placenta. Expression information was derived from the tissue 
20 sources of the sequences that were included in the derivation of the sequence of CuraGen Acc. 
No. CG57051-05. 

Cellular Localization and Sorting 

The PSORT, SignalP and hydropathy profile for the Angiopoietin-like protein are shown 
in Table 28. The results predict that this sequence has a signal peptide and is likely to be 
25 localized extracellularly with a certainty of 0.7332. The signal peptide is predicted by SignalP to 
be cleaved between amino acids 25 and 26: AQG-GP. 
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Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Angiopoietin-like protein includes the 
nucleic acid whose sequence is provided in Figure 21, or a fragment thereof. The invention also 
includes a mutant or variant nucleic acid any of whose bases may be changed from the 
5 corresponding base shown in Figure 21 while still encoding a protein that maintains its 

Angiopoietin-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to the 
sequence of CuraGen Acc. No. CG57051-05, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 

10 nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 

chemical modifications. Such modifications include, by way of non-limiting example, modified 
bases, and nucleic acids whose sugar phosphate backbones are modified or derivatized. These 
modifications are carried out at least in part to enhance the chemical stability of the modified 
nucleic acid, such that they may be used, for example, as antisense binding nucleic acids in 

15 therapeutic applications in a subject. In the mutant or variant nucleic acids, and their 
complements, up to about 19% of the bases may be so changed. 

The novel protein of the invention includes the Angiopoietin-like protein whose sequence 
is provided in Figure 21. The invention also includes a mutant or variant protein any of whose 
residues may be changed from the corresponding residue shown in Figure 21 while still encoding 
20 a protein that maintains its Angiopoietin-like activities and physiological functions, or a 

functional fragment thereof. In the mutant or variant protein, up to about 4% of the amino acid 
residues may be so changed. ^ 

Chimeric and Fusion Proteins 

The present invention includes chimeric or fusion proteins of the Angiopoietin-like 
25 protein, in which the Angiopoietin-like protein of the present invention is joined to a second 
polypeptide or protein that is not substantially homologous to the present novel protein. The 
second polypeptide can be fused to either the amino-terminus or carboxyl-terminus of the present 
CG57051-05 polypeptide. In certain embodiments a third nonhomologous polypeptide or protein 
may also be fused to the novel Angiopoietin-like protein such that the second nonhomologous 
30 polypeptide or protein is joined at the amino terminus, and the third nonhomologous polypeptide 

or protein is joined at the carboxyl terminus, of the CG57051-05 polypeptide. Examples of 
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nonhomologous sequences that may be incorporated as either a second or third polypeptide or 
protein include glutathione S-transferase, a heterologous signal sequence fused at the amino 
terminus of the Angiopoietin-like protein, an immunoglobulin sequence or domain, a serum 
protein or domain thereof (such as a serum albumin), an antigenic epitope, and a specificity 
5 motif such as (His)6. 

The invention further includes nucleic acids encoding any of the chimeric or fusion 
proteins described in the preceding paragraph. 

Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
10 (Fab)2 or single chain FV constructs, that bind inmiunospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
peptides and polypeptides that are fused to any carrier particle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

15 Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, cellular localization, and map 
location for the protein and nucleic acid disclosed herein suggest that this Angiopoietin-like 
protein may have important structural and/or physiological functions characteristic of the 
Angiopoietin family. Therefore, the nucleic acids and proteins of the invention are useful in 

20 potential diagnostic and therapeutic applications and as a research tool. .These include serving as 
a specific or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. These also include 
potential therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small 
molecule drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 

25 antibody), (iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), (v) an agent 
promoting tissue regeneration in vitro and in vivo, and (vi) a biological defense weapon. 

The nucleic acids and proteins of the invention have applications in the diagnosis and/or 
treatment of various diseases and disorders. For example, the compositions of the present 
invention will have efficacy for the treatment of patients suffering from: type II diabetes, obesity. 
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colon cancer, diabetes mellitus, insulin-resistant, with acanthosis nigricans and hypertension, 3- 
methylglutaconicaciduria, type HI; Cone-rod retinal dystrophy-2; DNA ligase I deficiency; 
Glutaricaciduria, type IIB Liposarcoma; Myotonic dystrophy as well as other diseases, disorders 
and conditions. 

5 These materials are further useful in the generation of antibodies that bind 

immunospecifically to the novel substances of the invention for use in diagnostic and/or 
therapeutic methods. 

Table 24. BLASTN search using CuraGen Acc. No. CG57051-05. 

10 >gb:GENBANK-ID: AF20263 6 I acc : AF2 02636 . 1 Homo sapiens angiopoietin-like protein 
PP1158 mRNA, complete cds - Homo sapiens, 1943 bp. (seq id no:87) 
Length = 1943 
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Plus strand HSPs: 

Score = 3105 (465.9 bits). Expect = 2.0e-134, P = 2.0e-134 

Identities = 867/1064 (81%), Positives = 867/1064 (81%), Strand = i>lus / 

CGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCGAGAGTCCCCGAATCCCCGCTCC 63 

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiMiiiiiiiiiiiiiiiii 

CGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCGAGAGTCCCCGAATCCCCGCTCC 156 

CAGGCTACCTAAGAGGATGAGCGGCGCTCCGACGGCCGGGGCAGCCCTGATGCTCTGCGC 123 

lllllllllllllllillllllll lllllllllllllllllllllllllllllllllll 
CAGGCTACCTAAGAGGATGAGCGGTGCTCCGACGGCCGGGGCAGCCCTGATGCTCTGCGC 216 

CGCCACCGCCGTGCTACTGAGCGCTCAGGGCGGACCCGTGCAGTCCAAGTCGCCGCGCTT 183 

IIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIMIIII 

CGCCACCGCCGTGCTACTGAGCGCTCAGGGCGGACCCGTGCAGTCCAAGTCGCCGCGCTT 27 6 

TGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACTCCTGC AGCTCGGCC AGGGGCT 243 

llllilllllllllllllllllllllllllllllllllllllllllllllllllllllll ^ 

TGCGTCC TGGGACGAGATGAATGTCCTGGCGCACGGACTCCTGC AGCTCGGCCAGGGGCT 336 

GCGCGAACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCTGGAGCGGCGCCTGAGCGC 303 

IIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIMIIIIIIIIIIMIIIIIIIIIIIII _ 

GCGCGAACACGCGGAGCGCACCCGCAGTC AGCTGAGCGCGCTGGAGCGGCGCCTGAGCGC 396 
GTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCTCCCGTTAGCCCCTGAGAG 363 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCTCCCGTTAGCCCCTGAGAG 456 
CCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACTCAAGGCTC AGAACAGCAG 423 

IIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIMIIIIIIIIIIIIIIIIIIIIIIIII 

CCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACTCAAGGCTCAGAACAGCAG 516 
GATCCAGCAACTCTTCCACAAGGTGGCCCAGCAGCAGCGGCACCTGGAGAAGCAGCACCT 483 

lllllilllllllllllllllllllllllllllllMIIIIIIIIIIIIIIIIIIIIIII ^ 

GATCCAGCAACTCTTCCACAAGGTGGCCCAGCAGCAGCGGCACCTGGAGAAGCAGCACCT 57 6 

GCGAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCACAAGCACCTAGACCATGA 543 

llllllllllllllllllllllllllilllllllllllllllllllllllllllllllil ^ 
GCGAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCACAAGCACCTAGACCATGA 636 

GGGTGGC- AAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGCCCAGCCAGTTGACCCGG 602 

II nil llllllllllllllllllllllllllllllllllllllllllllllllllll 

GG~TGGCCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGCCCAGCCAGTTGACCCGG 69 5 

603 CTCACAATGTCAGCCGCCTGCACCA- -TGG- -AGGC-TGGACAGTAA-T-TCAGAGGC-G 654 

lllllllllllllllillllllll II III II III I I I III I 
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Plus 




20 


Query: 


4 




Sbjct : 


97 


25 


Query: 
Sbjct : 


64 
157 




Query : 


124 


30 


Sbjct : 


217 




Query: 


184 


35 


Sbjct: 
Query : 


277 
244 




Sbjct: 


337 


40 


Query : 


304- 




Sbjct: 


397 


45 


Query: 
Sbjct : 


364 
457 




Query: 


424 


50 


Sbjct: 


517 




Query: 


484 


55 


Sbjct: 
Query : 


577 
544 




Sbjct: 


637 


60 


Query: 


603 
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Sbjct: 


696 




Query : 


655 


5 


Sbjct: 


756 




Query : 


714 


lU 


Sbjct: 
Query: 


815 
773 




Sbjct : 


866 


15 


Query: 


831 




Sbjct: 


922 


20 


Query: 

Sbjct: 


885 

982 




Query: 


942 


25 


Sbjct: 


1038 




Query: 


1000 


30 


Sbjct: 
Query: 


1092 
1059 




Sbjct: 


1146 


35 


Score = 3041 
Identities = 




Query : 


541 


40 


Sbjct: 


754 




Query: 


600 


45 


Sbjct : 
Query : 


811 
659 




Sbjct: 


866 


50 


Query: 


719 




Sbjct: 


926 




Query : 
Sbjct: 


779 
986 




Query : 


839 


60 


Sbjct: 


1046 




Query : 


899 


65 


Sbjct: 
Query: 


1106 
959 




Sbjct: 


1166 


70 


Query : 


1019 




Sbjct: 


1226 


75 


Query : 
Sbjct: 


1079 
1286 



696 CTCACAATGTCAGCCGCCTGCACCGGCTGCCCAGGGATTGCCAGGAGCTGTTCCAGGTTG 755 



II III llllllll 



II I 



nil I 



ATCCCCACGGCGAGTTCTGGCTGG -GTCTGGAGAAGGTGCATAGCATCATGGGGGACCGC 772 

II II I II II I III 1 lllll III II II I II I II I 

AACTGCAAGATGACCTCAGA~TGGAGGCTGGACA-G-TA-ATT-CAG-A--GGCG-CCAC 



AACAGCCGCCTGGCCGTGCAGCTGCGGGACTGGGATGGCAAC— GCCGAGTTGCTGCAGT 

I II III I III I III Mini I I II I II I I II I 

GATGGCTCAGTGGACTT- CAAC — CGGCCCTGGGAAGCCTACAAGGCGGGGTT-TGGGGA 



865 



830 



921 
884 



TCTCCGTG-C-AC — CTGGGTGGCGA-GGACACGGCCTATAGCCTG-CAGCTCACTGCAC 

II II III Mil III III I II lllll III II III 

TCCCCACGGCGAGTTCTGGCTGGGTCTGGAGAAGGTGCATAGCATCACGGGGGACCGCAA 981 
CCGTGGCC-GGCCA-GCTGG-GCGCCACCACCGTCCCACCCAGCGGCCTCTCCGTACCCT 941 

II 1 1 1 II II III III II I III III II I II I 

CAGCCGCCTGGCCGTGCAGCTGCGGGACTGGGATGGCAAC — GCCGAGT-TGC-TGCAGT 1037 



MM 



II Ml II I III I 

-TGC-ACCTGGGTGGCGAGGACA- 



II I II MM II I II 

GGCCTATAGC -CTGCAGCTCACTGCAC 1091 



II MM II I I III II II II I III II II Mill I I I I 

C-C — GTGGCCGGCCAGCTGGGCGCCACCA-CCGTCCCA-CC-CAGCGGCCTCTCCGTAC 1145 



II 



(457.3 bits). Expect = 7.4e-132, P = 7.4e-132 
658/699 (94%), Positives = 658/699 (94%), Strand 



Plus / Plus 



TGAGG-GTGGCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGCCCAGCCAGTTGACC 599 

II II I MM I II I I I I I III II I I I MM II 

TGGGGAGAGGCA-GAGTGGACTATTTGAAATCCAGCCTCAGGGGTCTCCGCCATTTTT— 810 

CGGCTCACAATGTCAGCCG-CCTGCACCATGGAGGCTGGACAGTAATTCAGAGGCGCCAC 658 

II I I I M II I III ir MIIIIMIMIIIMIMMIIIIMIIIII 

-GG-TGA- ACTGCAAGATGACCT-CAG-ATGGAGGCTGGACAGTAATTCAGAGGCGCCAC 865 
GATGGCTCAGTGGACTTCAACCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCC 718 

II MM lllll llllll I MM I lllll llllllll lllll I lllll MM lllllll II 

GATGGCTCAGTGGACTTCAACCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCC 925 

CACGGCGAGTTCTGGCTGGGTCTGGAGAAGGTGCATAGCATCATGGGGGACCGCAACAGC 778 

lllllllllllllllllllllllllllllllllllllllllll llllllllllllllll 
CACGGCGAGTTCTGGCTGGGTCTGGAGAAGGTGCATAGCATCACGGGGGACCGCAACAGC 985 

CGCCTGGCCGTGC AGCTGCGGGACTGGGATGGCAACGCCGAGTTGCTGC AGTTCTCCGTG 838 

lllllll III III llllllll II llllllll I IIIIIIIIIIIIIIIMIIIIII lllll 

CGCCTGGCCGTGCAGCTGCGGGACTGGGATGGCAACGCCGAGTTGCTGCAGTTCTCCGTG 1045 

CACCTGGGTGGCGAGGACACGGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAG 898 

II MM I II III Mlllll lllll IIIM II III MM MM III II III lllll Mill 

CACCTGGGTGGCGAGGACACGGCCTATAGCCTGCAGCTC ACTGCACCCGTGGCCGGCCAG 1105 

CTGGGCGCCACCACCGTCCCACCCAGCGGCCTCTCCGTACCCTTCTCCACTTGGGACCAG 958 

lllllllllllll II II lllll I lllll llllll lllll llllll II MM lllll II II 

CTGGGCGCCACCACCGTCCCACCCAGCGGCCTCTCCGTACCCTTCTCCACTTGGGACCAG 1165 

GATCACGACCTCCGCAGGGACAAGAACTGCGCCAAGAGCCTCTCTGGAGGCTGGTGGTTT 1018 

lllllllllllll llllllllll llllll I II III II II II llllll III MM II II II 

GATCACGACCTCCGCAGGGACAAGAACTGCGCCAAGAGCCTCTCTGG AGGCTGGTGGTTT 1225 

GGCACCTGC AGCC ATTCCAACCTCAACGGCC AGTACTTCCGCTCCATCCCACAGCAGCGG 1078 

II MM lllllll llllllllll llllll lllll lllll llllllllllll lllll II II 

GGCACCTGCAGCCATTCCAACCTCAACGGCCAGTACTTCCGCTCCATCCCACAGCAGCGG 12 85 

CAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAG 1138 

llllll IIIIIIIIIIMMIIIIIIIIMMIIMIIMMIIIMIIIMMIIM II 

CAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAG 1345 
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Query 
Sbjct 

Query 
Sbjct 



1139 GCCACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAGCGTCCTGGCTGG 1198 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllilll 
1346 GCCACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAGCGTCCTGGCTGG 1405 

1199 GCCTGGTCCCAGGCCCACGAAAGA-GGTGACTCTTGGCTCTG 1239 

llllllllllllllllllllllll IIIIIIIIIIMIIIII 
1406 GCCTGGTCCCAGGCCCACGAAAG ACGGTGACTCTTGGCTCTG 1447 



10 
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Table 25. BLASTP search using the protein of CuraGen Acc. No, CG57d51-05. 

>ptnr:SPTREMBL-ACC:Q9HBV4 ANGIOPOIETIN-LIKE PROTEIN PP1158 - Homo sapiens 
(Hxxman) , 406 aa. {seq id no:88) 
Length = 406 

Score = 1015 (357.3 bits). Expect = l-6e-197. Sum P{2) = 1.6e-197 
Identities = 185/192 (96%), Positives = 185/192 (96%) 





Query : 


177 


20 


Sbjct: 


215 




Query: 


237 




Sbjct: 


275 


25 






Query: 


297 




Sbjct: 


335 


30 


Query: 


357 




Sbjct: 


395 



35 



40 



45 



50 



NVSRLHHGGWTVIQRRHDGSVDFNRPWEAYKAGFGDPHGEFWLGLEKVHSIMGDRNSRLA 236 

I IIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIII llllllll 

NCKMTSDGGWTVIQRRHDGSVDFNRPWEAYKAGFGDPHGEFWLGLEKVHS ITGDRNSRLA 274 

VQLRDWDGNAELLQFSVHLGGEDTAYSLQLTAPVAGQLGATTVPPSGLSVPFSTWDQDHD 296 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

VQLRDWDGNAELLQFSVHLGGEDTAYSLQLTAPVAGQLGATTVPPSGLSVPFSTWDQDHD 334 

LRRDKNCAKSLSGGWWFGTCSHSNLNGQYFRSIPQQRQKLKKGIFWKTWRGRYYPLQATT 356 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
LRRDKNCAKSLSGGWWFGTCSHSNLNGQYFRSI PQQRQKLKKGIFWKTWRGRYYPLQATT 394 



MLIQPMAAEAAS 

llllllllllll 

MLIQPMAAEAAS 



368 



406 



Score = 923 (324.9 bits). Expect = 1.6e-197, Sum P(2) = 1.6e-197 
Identities = 180/182 (98%), Positives = 180/182 (98%) 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct : 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 



llllllllllllllllllllllllllllllllllllllllllllllllll 



RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllli 

RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 

HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEGGKPARRKRLPEMAQPVDPAHNVSR 

lllllllllllllllllllllllllllllllllll lllllllllllllllllllllll 
HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 



120 



120 



180 



180 



II 



Table 26. BLASTN identity search of CuraGen Corporation's Human SeqCalling database using CuraGen 
Acc. No. CG57051-05. 



55 >s3aq:217939973 , 631 bp. (seq id NO: 89) 
Length =631 

Minus Strand HSPs : 

60 Score = 2620 (393.1 bits). Expect = 9.1e-113, P = 9.1e-113 

Identities = 526/527 ( 99% ), Positives = 526/527 (99%), Strand 



Minus / Plus 
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10 



15 



20 



25 



30 



35 





1239 


Sbjct : 


105 


Qusry : 


1180 


Sbjct : 


165 


QuGiry I 


1120 


Sbjct : 


225 


Qusry z 


1060 


Sbjct : 


285 




1000 


Sbjct : 


345 




940 


Sbjct : 


405 


Query : 


880 


Sbjct: 


465 


Query : 


820 


Sbjct: 


525 


Query: 


760 


Sbjct: 


585 



C AGAGCCAAGAGTCACC - TCTTTCGTGGGCCTGGGACC AGGCCC AGCCAGGACGCTAGGA 1181 

IMIIIIIIIIIIIIII llllllllllllllllllllllllllllllllllllllllll 

CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 164 

GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 1121 

llllllllllllllllllllllillllllllllllllllllllllillllllllllllll 
GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 224 



illllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



llllllllllllllllllllllllllllllllilllllllllllllllllllllllllil 



GAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAA 

Mllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAA 



941 



404 



881 



GGGTACGGAGAGGCCGCTGGGTGGGACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGC 

IIIMIIMMIIIIIIIIIMIIIIIIIIIIIIIIIIIIIMIMIIIIIIIIIIIIII 

GGGTACGGAGAGGCCGCTGGGTGGGACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGC 464 

AGTGAGCTGCAGGCTATAGGCCGTGTCCTCGCCACCCAGGTGCACGGAGAACTGCAGCAA 821 

lllllllllllllllllllllllllllllillllllllllllllllllllllllllllll 
AGTGAGCTGCAGGCTATAGGCCGTGTCCTCGCCACCCAGGTGCACGGAGAACTGCAGCAA 52 4 

CTCGGCGTTGCCATCCCAGTCCCGCAGCTGCACGGCCAGGCGGCTGTTGCGGTCCCCCAT 761 

lllllllllillllllllllllllllllllllllllllllllllllllllllllllllll 

CTCGGCGTTGCCATCCCAGTCCCGC AGCTGCACGGCC AGGCGGCTGTTGCGGTCCCCC AT 584 

GATGCTATGCACCTTCTCCAGACCCAGCCAGAACTCGCCGTGGGGAT 714 

lllllllllllllllllllllllllllllllllllllllllllllll 

GATGCTATGCACCTTCTCCAGACCCAGCCAGAACTCGCCGTGGGGAT 631 



40 



>s3aq:230121563 , 788 bp. 

Length = 788 

Minus Strand HSPs : 



(SEQ ID NO: 90) 



45 



Score = 2583 (387.6 bits). Expect = 3.4e-lll, P = 3.4e-lll 

Identities = 533/548 (97%), Positives = 533/548 (97%), Strand = Minus / Plus 

CAGAGCC AAG AGTCACC - TCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 1181 

lllllllllllllllll llllllllllllllllllllllllllllllllllllllllll 

CAGAGCCAAGAGTCACCGTCTTTCGTGGGCCTGGGACCAGGCCCAGCCAGGACGCTAGGA 230 

GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 1121 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIII 
GGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTAGTAGCG 290 

GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 1061 

IIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

GCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGGGATGGA 350 

GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCCAGA 1001 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIII 

GCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCCTCCAGA 410 

GAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAA 941 

lllllllllllllllllllllllllllllllllllllillllllllllllllllllllll 

GAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAA 470 

GGGTACGGAGAGGCCGCTGGGTGGGACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGC 881 

IIIIIIIMIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIillll 

GGGTACGGAGAGGCCGCTGGGTGGGACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGC 530 





Query : 


1239 




Sbjct: 


171 


50 


Query: 


1180 




Sbjct: 


231 


55 


Query: 


1120 






Sbjct : 


291 




Query : 


1060 


60 


Sbjct: 


351 




Query : 


1000 


65 


Sbjct: 


411 






Query: 


940 




Sbjct: 


471 


70 


Query : 


880 
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10 



15 



20 



25 



30 



35 



40 



45 



Sbjct : 


531 


Query: 


820 


Sbjct: 


591 


Query: 


760 


Sbjct: 


651 


Query: 


700 


Sbjct: 


710 



llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

AGTGAGCTGCAGGCTATAGGCCGTGTCCTCGCCACCCAGGTGCACGGAGAACTGC AGCAA 590 

CTCGGCGTTGCCATCCCAGTCCCGCAGCTGCACGGCCAGGCGGCTGTTGCGGTCCCCCAT 761 

lllllllllllllllllllllllllillllllllllllllllllllllllllllllll I 

CTCGGCGTTGCCATCCCAGTCCCGCAGCTGCACGGCCAGGCGGCTGTTGCGGTCCCCCGT 650 

GATGCTATGCACCTTCTCCAGACCCAGCCAGAACTCGCCGTGGGGATCCCCAAACCCCGC 701 

lllllllllllllllllllllllllllllllllllllll III I I II I I 

GATGCTATGCACCTTCTCCAGACCCAGCCAGAACTCGCC-TGGAGTGGGAGAGGCCACTC 709 



I II Mil 



>s3aq:217940431 Category E: 
Length = 530 
Minus Strand HSPs: 
Score = 1795 (269.3 bits). 
Identities = 381/399 (95%) 



530 bp. (SEQ ID NO: 91) 



Expect = 2.0e-75, P = 2.0e-75 
Positives = 381/399 (95%), Strand 



Minus / Plus 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query : 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct : 



553 



132 



496 



CTTGCCACCCTCATGGTCTAGGTG-CTT -GTGGTCCAG- GAGGCCAAACTGGCTTTGCAG 

II I I II III I II I III II II III lllllll IIIIIIMIIII 

CTGGTCCCCGTCA-G-TCAATGTGACTGAGTCCGCCATTGAGGCCAGTCTGGCTTTGCAG 



497 



189 



437 



ATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGTGGAA 

llllllllillllillllllllllllllllllllllllllllllllllllllllllllll 

190 ATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGTGGAA 249 

436 GAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGACCTC 377 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIII 

250 GAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGACCTC 309 



376 



317 



AGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCTGACA 

lillllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
310 AGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCTGACA 369 



316 



257 



GGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGCGCTC 

iiiiiiiiiiiiiiiiiiiiiiiiiii I iiiiiiiiiiiiiiiiiiiiiiiii.niii 

370 GGCGGACCCGCACGCGCTCAGGCGCCGTTTCAGCGCGCTCAGCTGACTGCGGGTGCGCTC 429 

2 56 CGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCATCTC 197 

llllllllllllllllllllllllllllllllllllllllllllllllllljllllllll 
430 CGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGAGATTCATCTC 489 

196 GTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTC 156 

lllllllllllllllllllllllllllllllllllllllll 
490 GTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTC 530 



>s3aq:217940613 , 336 bp. 
50 Length = 336 



(SEQ ID NO: 92) 



Minus Strand HSPs : 

Score = 995 (149.3 bits). Expect = 9.4e-56, Sum P(2) = 9.4e-56 
55 Identities = 203/204 (99%), Positives = 203/204 (99%), Strand = Minus / Plus 



60 



65 



Query : 


626 


Sbjct: 


133 


Query: 


566 


Sbjct : 


193 


Query: 


507 


Sbjct: 


252 



GGTGCAGGCGGCTGACATTGTGAGCCGGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCT 567 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIill 

GGTGCAGGCGGCTGACATTGTGAGCCGGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCT 192 

TTCTTCGGGCAGGCTTG-CCACCCTCATGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAAC 508 

lllllllllllllllll Mill IIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIII 

TTCTTCGGGC AGGCTTGGCCACC -TCATGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAAC 251 

TGGCTTTGCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCC 448 

lllllllllllllllllllllllllllllllllilllllllllllllllMIIIIIIIII 

TGGCTTTGCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCC 311 
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'Query: 447 ACCTTGTGGiUVGAGTTGCTGGATCC 423 

lllllllllllllllllllllllll 
Sbjct: 312 ACCTTGTGGAAGAGTTGCTGGATCC 336 

Score = 410 (61.5 bits), Expect = 9.4e-56, Sum P(2) = 9.4e-56 (SEQ ID NO: 129) 
Identities = 86/91 (94%), Positives = 86/91 (94%), Strand = Minus / Plus 



10 



15 



Query : 


717 


Sbjct: 


1 


Query : 


657 


Sbjct: 


61 



GGATCCCCAAACCCCGCCTTGTAGGCTTCCCAGGGCCGGTTGAAGTCCACTGAGCCATCG 658 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GGATCCCCAAACCCCGCCTTGTAGGCTTCCCAGGGCCGGTTGAAGTCCACTGAGCCATCG 6 0 

TGGCGCCTCTGAATTACTGTCCAGCCTCCAT 627 

llllllllllllllll llllll II I I 
TGGCGCCTCTGAATTAATGTCCACTCTGCCT 91 



20 



>s3aq:217939964 , 328 bp. (SEQ ID NO:93) 
Length = 328 

Plus Scrand HSPs: 



25 



30 



35 



Score = 762 (114.3 bits). Expect = 1.5e-28, P = 1.5e-28 

Identities = 156/159 (98%), Positives = 156/159 (98%). Strand = Plus / Plus 



lllllllllllllllllllllilllllllllllillllllllllllllllllllllllll 



Query: 


1082 


Sbjct: 


1 


Query: 


1142 


Sbjct: 


61 


Query : 


1202 


Sbjct: 


121 



II 



lllllllllllllilllllllllllllllllllllllllllllllllMI 



Query: 1202 TGGTCCCAGGCCCACGAAAGA-GGTGACTCTTGGCTCTG 1239 
llllllllli llllllll lllllllllllllll I 



Table 27. ClustalW alignment of CG57051-05 protein with related proteins. 
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CG57051-05 


1 






PTA GAALI.' 


LCAATAVL 


LSA 


Q9HBV4 


1 




A 


PTA GAALM 


LCAATAVL 


LSA 


CG57051-04 


1 




A 


PTA GAALM 


LCAATAVL 


LSA 


CG57051-02 


1 




A 


PTA GAALM 


LCAATAVL 





GP VQ S KS PRFAS E MNV L AHGL LQLGQGL RE HAE 
G P V Q S KS P R F A S ™ E M N V L A HG L L QL G Q GL R E H AE 
GPVQ S KSPRFAS TOE MNV L AHGL LQLGQGL RE HAE 
GPVQ S KS PRFAS V/D E MNV L AHGL LQLGQGL RE HAE 



60 
60 
60 
60 



CG57051-05 


61 


RTR^ 


;ciLS 


A 


lerrl: 


;acg: 


'ACQ 


gtec 


rSTD 


L 


PLAPE 


SR' 




P 


E 


vlh: 


JLQTC 


)LICA 


5 it: 


:r I( 


;'QLF 


Q9HBV4 


61 


rtr: 


;qls 


A 


lerrl< 


acg: 


;acc 


GTEC 


rSTD 


L 


PLAPE 


SRI 




P 


E 


VLHi 


;lqtqlica 




:rk 


;^QLF 


CG57051-04 


61 


RTR^ 


;gL s 


A 


LERRLi 


ACGi 


;acc 


GTEC 


rSTD 


L 


PLAPE 


SRI 




P 


E 


VLHi 


:lotc 


'LKA 




^R It 


3QLF 


CG57051-02 


61 


RTRS 


:qls 


A 


LERRL^ 


acg: 


'ACC 


GTEG 


rSTD 


L 


PLAPE 


SR^ 




P 


E 


VLHL 


: LOTC 


■LfLA< 


;.n: 


^RI-: 


^QLF 



120 
120 
120 
120 



H KV AQQ Q R HLE KQHL R I QHL 0 S Q FG L L D H ICHL DHE PAR RK R L P E M AQP V D P A HN V S R 
H KV AOQ Q R HL E KQHL R I QHL Q S Q FG L L D H ICHL DHE V A K P A R RK R L P E M AQP V D P A HN V S R 
H KV AQQ Q R HL E KQHL R I QHL Q S Q FG L L D H ICHL DHE V A K P A R RK R L P E M AQP V D P A HN V S R 
H KVAQQQ RHLE KQHL R I QHLQSQFGL LDH PIHLDHE VAKP A RRKRLPEMAQP VDPAHNV SR 



CG57051-05 121 

Q9HBV4 121 

CG57051-04 121 

CG57051-02 121 

CG57051-05 181 

Q9HBV4 181 

CG57051-04 181 

CG57051-02 181 

CG57051-05 203 

Q9HBV4 241 

CG57051-04 133 

CG57051-02 203 

CG57051-05 263 

Q9HBV4 301 

CG57051-04 183 

CG57051-02 263 

CG57051-05 308 

Q9HBV4 346 

CG57051-04 184 

CG57051-02 323 

CG57051-05 365 

Q9HBV4 403 

CG57051-04 239 

CG57051.02 383 



LPRD CQELFQVGERQSGLFEIQPQGSPPFLVNCKMTS D 



[GGV.TV I 


QRRHDGS 
QRRHDGS 


VDFNRP 
VDFNRP 




[GGOTV I 


QRRHDGS 


SdfnrpI 



•A^E AYICA G F GDP HGE FV.iXGLEK\'TiS I i^ZrD RNS RL AVQL RDV/DGN A ELL Q FS VHL GGEDT AY 
m AYICA G F GDP HGE FV/LGLEK\rHS I TGD RNS RL AVQL RDWD GN A EL L 0 FS VHL GGEDT AY 



OTAYKAG FGDPHGEFVa.GLEK\7HS I TGDRNSRLAVQL RDWD GN A ELL Q FS VHL GGEDT AY 



202 
240 
183 
202 

262 
300 
183 
262 



SLQLTAP VAGQLGATTVPPSGLSVPFSTV/DQDHDLRRDKN CAKSL 
SLQLTAP VAGQLGATTVPPSGLSVPFSTV7DQDHDLRRDKN CAKSL 



S LQLTAP VAGQLGATTVPPSGLSVPFSTVmQDHDLRRDKN CAKS L 



307 

345 

183 

APSVAQRPDHVPSP 322 




S GGVWFGTC S HS N LNGQYFRS I P QQ RQI-^-IGCG I FV..1CT VTIG R YYP L QATTML I QPMAA 
SGG^AnZ/FGTC S HS N LNGQYFRS I P QQ RQKLIUCG I FWLT V/RlG R YYP L QATTML I QPMAA 
GVAVFGTC SHS NLNGQYFRS I F QQ RQKLKKG I F^^^CT V/ElG R YYP L QATTML I QPMAA 
GWFGTC SHS NLNGQYFRS I P OQ ROiaiCICG I FVHCT VvT?.G R YYP L QATTML I QPMAA 



364 
402 
238 
382 

368 
406 
242 
386 



Information for the ClustalW proteins: 



Accno 

CG5705 1 -05 (SEQ id N0:53) 
CG57051-04 (SEQlDNO:51) 

CG5705 1-02 (SEQ id no:ss) 

Q9HB V4 (SEQ ID NO:80) 



Common Name Length 

novel Angiopoietin-like protein 368 

Angiopoietin-like protein- isoform 4 242 

Angiopoietin-like protein- isoform 2 386 

ANGIOPOIETIN-LIKE PROTEIN PPl 158. 406 



In the aUgnment shown above, black outlined amino acid residues indicate residues 

identically conserved between sequences (i.e., residues that may be required to preserve 

5 structural or functional properties); amino acid residues with a gray background are similar to 

one another between sequences, possessing comparable physical and/or chemical properties 

without altering protein structure or function (e.g. the group L,V, I, and M may be considered 
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similar); and amino acid residues with a white background are neither conserved nor similar 
between sequences. 

Table 28. PSORT, Signal? and hydropathy results for CuraGen Acc. No. CG57051-05. 

5 outside — Certainty=0.7332(Affirmative) < suco 

microbody (peroxisome) — Certainty=0,2608( Affirmative) < suco 
endoplasmic reticulum (membrane) — Certainty=0. 1 000( Affirmative) < suco 
endoplasmic reticulum (lumen) — Certainty=0.1000( Affirmative) < suco 

10 

Is the sequence a signal peptide? 

# Measure Position Value Cutoff Conclusion 
max.C 31 0.306 0.37 NO 

max. Y 26 0.429 0.34 YES 
15 max. S 8 0.952 0.88 YES 
means 1-25 0.848 0.48 YES 

# Most likely cleavage site between pos. 25 and 26: AQG-GP 

SECP17 

20 A SECP17 nucleic acid and polypeptide according to the invention includes the nucleic 

acid sequence (SEQ ID NO:54) and encoded polypeptide sequence (SEQ ID NO:55) of clone 

CG57051-02 directed toward novel Angiopoietin-like proteins and nucleic acids 
encoding them. Figure 22 illustrates the nucleic acid sequence and amino acid sequences 
respectively. This clone includes a nucleotide sequence (SEQ ID NO:54) of 1315 bp. The 

25 nucleotide sequence includes an open reading frame (ORF) beginning with an ATG initiation 
codon at nucleotides 155-157 and ending with a TAG stop codon at nucleotides 1313-1315. 
Putative untranslated regions, if any, are found upstream from the initiation codon and 
downstream from the termination codon. The encoded protein having 386 amino acid residues is 
presented using the one-letter code in Figure 22. The protein encoded by clone CG57051-02 is 

30 predicted by the PSORT program to be located extracellularly with a certainty of 0.7332 and has 
a signal peptide (see Table 33 below). The PGR product derived by exon linking, covering the 
entire open reading frame, was cloned into the pCR2.1 vector from Invitrogen to provide clone 
157544::CG50847-01.891637.M13 and clone 157544::CG50847-01.891637.O5. SeqCalling 
procedures were also utilized to identify CG57051-02, and the following public components 

35 were thus included in the invention: gb_accno: AC010323 Homo sapiens chromosome 19 clone 
CTD-2550O8, WORKING DRAFT SEQUENCE, 55 unordered pieces. In addition, the 
following Curagen Corporation SeqCalling Assembly ID's were also included in the invention: 
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16237x7751. The DNA and protein sequences for the novel Angiopoietin-like gene are reported 
here as CuraGen Acc. No. CG5705 1-02. 

Similarities 

CG57051-04 directed toward novel Angiopoietin-like proteins and nucleic acids 
5 encoding them. Figure 20 illustrates the nucleic acid sequence and amino acid sequences 
respectively. This clone includes a nucleotide sequence (SEQ ID NO:50) of 937 bp. The 
nucleotide sequence includes an open reading frame (ORF) beginning with an ATG initiation 
codon at nucleotides 155-157 and ending with a TAG stop codon at nucleotides 881-883. 
Putative untranslated regions, if any, are found upstream from the initiation codon and 
10 downstream from the termination codon. The encoded protein having 242 amino acid residues is 
presented using the one-letter code in Figure 20. The protein encoded by clone CG5705 1-04 is 
predicted by the PSORT program to be located at the endoplasmic reticulum with a certainty of 
0.8200, and appears to be a signal protein (see Table 27 below). 

In a search of sequence databases, it was found, for example, that the nucleic acid 
15 sequence of this invention has 696 of 700 bases (99%) identical to a gbiGENBANK- 

ID:AF202636|acc:AF202636.1 mRNA from Homo sapiens (Homo sapiens angiopoietin-like 
protein PP1158 mRNA, complete cds) (Table 29), The full amino acid sequence of the protein 
of the invention was found to have 179 of 182 amino acid residues (98%) identical to, and 180 of 
182 amino acid residues (98%) similar to, the 406 amino acid residue ptnr:SPTREMBL- 
20 ACC:Q9NZU4 protein from Homo sapiens (Human) (HEPATIC ANGIOPOEETIN-RELATED 
PROTEIN) (Table 30). 

A multiple sequence alignment is given in Table 32, with the protein of the invention 
being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
related protein sequences. 

25 The presence of identifiable domains in the protein disclosed herein was determined by 

searches versus domain databases such as Pfam, PROSITE, ProDom, Blocks or Prints and then 
identified by the Interpro domain accession number. Significant domains are summarized below: 
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HZOICR 2.1.1 (Dec 1998) 

Copyzight; (C) 1992-1998 XiPasKing^on Vnivezs i'by School o£ Kedicine 

HMHER is fseely dis«zibut;«d UTidez -bKe G-inT Genezal Public License (GPL). 



HHH £il«: pfojikHHHs 

Sequence £ 11 e . / da«a^ gene^ool s/kspy«ek39627Cg57051_02Pzo'be inFAS^A.-bxl; 



Quezy: CG57051_02 

Scozes £oz sequence family classification (scoze includes all domains): 
Hodel Description Scoze E-value N 



£ibrinogen_C Fibrinogen beta and gaainma cKains/ C'tezm 143.9 3.fie-40 2 



Pazsed £oz domains: 

Ho del Domain seq-£ seq-t biran-f bnun-t 



£ibzinogen_C 1/2 184 246 47 123 .. 102.5 2.4e-28 

£ibzinogen_C 2/2 288 380 .. 178 272 .) 43,4 1.9e-ll 



Alignments o£ top-scozing domains: 

£ibzinogen_C: domain 1 of 2, £zom 184 to 246: scoze 102.5^ E - 2,4e-28 
*- >&GWU£ Q xRqD Gs 1 nFyPniak dYk e GPGnl st s gt GJckVCgl p gE 173 
GGWrU+QzR DGs +IM-R XiH-+Yk+&rG++ gErW 
C&57051_02 184 GGWUIQfiIUiIDG3HDrRPPTiilCA^AGr<^DPH GEFW 218 



L GNdk ibl LTk qgs ipy eLRveL oDTonGe t < - * 
LG++k K++T + L v+L+DtxH-G++ 

C&57051_02 219 L GLEKUHS IT GDR — NSRLAUQLKDXODGlirA 246 

£ibzinogen_C: domain 2 o£ 2, £zom 288 to 380: scoze 43.4, E = 1.9e-ll 

*->r3TyDzD]n)g19sTtspsgnCAesyg. _ .gGRG 

F3T+D D D + ++nCA+5 + ++ +++++ +++ ++ gG 
CGS7051_02 288 FSTWDQDHD — L — RRDKNCAKSLSapsvaqrpdhvpspltpaGG — 328 

aWlliJynsCbaAWLlI&zYV. . - . y Ggty sp qEmaph.GtDnGwI9atTii]lk Gsnq 
tJRH- C +1ILN& Y ++ +++ ++ + G++Ta) tTiiH-G+ 

C&57051_02 329 -XJlOF&TCSHSnLirGQYFzs ipQQRQKLKK GirraRTTORGR — 366 

AqP GGY^Sndc £ aeHK iRP z< - * 
y ++ ++H i P 
CfrS7051_02 367 YYPLQATTHL IQPK 380 



IPR002181: Fibrinogen [1] , the principal protein of vertebrate blood clotting is an 
hexamer containing two sets of three different chains (alpha, beta, and gamma), linked to each 
other by disulfide bonds. The N-terminal sections of these three chains are evolutionary related 
5 and contain the cysteines that participate in the cross-linking of the chains. However, there is no 
similarity between the C-terminal part of the alpha chain and that of the beta and gamma chains. 
The C-terminal part of the beta and gamma chains forms a domain of about 270 amino-acid 
residues. As shown in the schematic representation this domain contains four conserved 
cysteines involved in two disulfide bonds. 

xxxxCxxxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxxCxxxxxxxxxxx 

II II 

+ + + + 

10 

'C*: conserved cysteine involved in a disulfide bond. (SEQmNO:i26) 
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Such a domain has been recently found [2] in other proteins which are listed below. 

Two sea cucumber fibrinogen-like proteins (FReP-A and FReP-B). These are proteins, of 
about 260 amino acids, which have a fibrinogen beta/gamma C-terminal domain. In the C- 
terminus of Drosophila protein scabrous (gene sea). Scabrous is involved in the regulation of 
5 neurogenesis in I>rosophila and may encode a lateral inhibitor of R8 cells differentiation. In the 
C-terminus of a mammalian T-cell specific protein of unknown function. In the C-terminus of a 
human protein of unknown function which is encoded on the opposite strand of the steroid 21- 
hydroxylase/complement component C4 gene locus. 

The function of this domain is not yet known, but it has been suggested [2] that it could 
10 be involved in protein-protein interactions. 

This indicates that the sequence of the invention has properties similar to those of other 
proteins known to contain this/these domain(s) and similar to the properties of these domains. 

Chromosomal information: 

The Angiopoietin-like gene disclosed in this invention maps to chromosome 19ql3.3. 
15 This assignment was made using mapping information associated with genomic clones, public 
genes and ESTs sharing sequence identity with the disclosed sequence and CuraGen 
Corporation's Electronic Northern bioinformatic tool. 

Tissue expression 

The Angiopoietin-iike gene disclosed in this invention is expressed in at least the 
20 following tissues: adipocytes. Expression information was derived from the tissue sources of the 
sequences that were included in the derivation of the sequence of CuraGen Acc. No. CG57051- 
02. 

CeUular Localization and Sorting 

The PSORT, SignalP and hydropathy profile for the Angiopoietin-like protein are shown 
25 in Table 33. Although PSORT suggests that the Angiopoietin-like protein may be localized in 
the nucleus, the protein of CuraGen Acc. No. CG57051-02 predicted here is similar to the 
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Angiopoietin family, some members of which are secreted. Therefore it is likely that this novel 
Angiopoietin-like protein is localized to the same sub-cellular compartment. 

Functional Variants and Homologs 

The novel nucleic acid of the invention encoding an Angiopoietin-like protein includes 
5 the nucleic acid whose sequence is provided in Figure 22, or a fragment thereof. The invention 
also includes a mutant or variant nucleic acid any of whose bases may be changed from the 
corresponding base shown in Figure 22 while still encoding a protein that maintains its 
Angiopoietin-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to the 

10 sequence of CuraGen Acc. No. CG57051-02, including nucleic acid fragments that are 

complementary to any of the nucleic acids just described. The invention additionally includes 
nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 
chemical modifications. Such modifications include, by way of non-limiting example, modified 
bases, and nucleic acids whose sugar phosphate backbones are modified or derivatized. These 

15 modifications are carried out at least in part to enhance the chemical stability of the modified 
nucleic acid, such that they may be used, for example, as antisense binding nucleic acids in 
therapeutic applications in a subject. In the mutant or variant nucleic acids, and their 
complements, up to about 1% of the bases may be so changed. 

The novel protein of the invention includes the Angiopoietin-like protein whose sequence 
20 is provided in Figure 22. The invention also includes a mutant or variant protein any of whose 

residues may be changed from the corresponding residue shown in Figure 22 while still encoding 
a protein that maintains its Angiopoietin-like activities and physiological functions, or a 
functional fragment thereof. In the mutant or variant protein, up to about 2% of the amino acid 
residues may be so changed. 

25 Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
(Fab)2 or single chain FV constructs, that bind incmiunospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
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peptides and polypeptides that are fused to any carrier particle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, cellular localization, and map 
5 location for the protein and nucleic acid disclosed herein suggest that this Angiopoietin-like 
protein may have important structural and/or physiological functions characteristic of the 
Angiopoietin family. Therefore, the nucleic acids and proteins of the invention are useful in 
potential diagnostic and therapeutic applications and as a research tool. These include serving as 
a specific or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the 
10 presence or amount of the nucleic acid or the protein are to be assessed. These also include 
potential therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small 
molecule drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), (iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), (v) an agent 
promoting tissue regeneration in vitro and in vivo, and (vi) a biological defense weapon. 

15 The nucleic acids and proteins of the invention have applications in the diagnosis and/or 

treatment of various diseases and disorders. For example, the compositions of the present 
invention will have efficacy for the treatment of patients suffering from: type II diabetes, obesity, 
colon cancer, DIABETES MELLITUS, INSULIN-RESISTANT, WITH ACANTHOSIS 
NIGRICANS AND HYPERTENSION,3-methylglutaconicaciduria, type ffl; Cone-rod retinal 

20 dystrophy-2;DNA ligase I deficiency; Glutaricaciduria, type IIB;Liposarcoma; Myotonic 
dystrophy as well as other diseases, disorders and conditions. 

These materials are further useful in the generation of antibodies that bind 
immunospecifically to the novel substances of the invention for use in diagnostic and/or 
therapeutic methods. 

25 Table 29. BLASTN search using CuraGen Acc. No. CG57051-02. 

>gb:GENBANK-ID:AF202636|acc:AF202636.1 Homo sapiens angiopoietin-like protein 
PP1158 mRNA, complete cds - Homo sapiens, 1943 bp. (Seq id no:94) 
Length = 1943 

30 Plus strand HSPs: 

Score = 3448 (517,3 bits). Expect = 8.3e-233, Sum P(2) = 8.3e-233 
Identities = 696/700 (99%), Positives = 696/700 (99%), Strand = Plus / Plus 
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10 



15 



20 



25 



30 



35 



40 



45 



50 





2 


Sbjct : 


20 




62 


Sbjct : 


80 




122 


Sbj ct : 


140 




182 


Sbjct : 


200 




241 


Sb j ct : 


259 




301 


Sb j ct * 


319 




361 


Sbj c t : 


379 




421 


Sb j ct : 


439 




481 


Sbj c t : 


499 


Query. 


541 


Sbjct: 


559 


Query: 


601 


Sbjct : 


619 


Query: 


661 ' 


Sbjct: 


679 ' 


Score 


= 1887 



GCGGATCCTCACACGACTGTGATCCGATTCTTTCCAGCGGCTTCTGCAACCAAGCGGGTC 6 1 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIII 

GCGGATCCTCACACGACTGTGATCCGATTCTTTCCAGCGGCTTCTGCAACCAAGCGGGTC 7 9 

TTACCCCCGGTCCTCCGCGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCGAGAGT 121 

IIIIMIIIIIIIIIIIIIIIIIIIIIillllllllllllilllllllllllllllllll 
TTACCCCCGGTCCTCCGCGTCTCCAGTCCTCGCACCTGGAACCCCAACGTCCCCGAGAGT 139 

CCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGACGGCCGGGGCA 181 

lllllllllllllllllllllllllllllllllMllllillllllllllllllllllll 

CCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGACGGCCGGGGCA 199 

GCCCTGATGCTCTGCGCCGCC ACCGCCGTGCTACTGAGCGCT - AGATCTGGACCCGTGCA 240 

llllllllllllllllllllllllllllllllllllllllll II I lllllllllll 
GCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCTCAGGGC-GGACCCGTGCA 258 



GTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACTCCT 

lillllllllllllllMlllllllllllllllllllllllllllillllMIIIIIIII 

GTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCACGGACTCCT 



300 



318 



360 



GCAGCTCGGCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCT 

lllllllllllllllillllllllllllllllllllllllllilllllllllllllllll 
GCAGCTCGGCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCTGAGCGCGCT 378 

GGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCT 420 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
GGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTCCACCGACCT 438 

CCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACT 480 

lllllllllilllllllllllllllllllllllllllllllllllllllllillllllll 

CCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCAGACACAACT 498 

CAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACAAGGTGGCCCAGC AGCAGCGGCA 540 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIII 

CAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACAAGGTGGCCCAGCAGCAGCGGCA 558 

CCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCA ,600 

IIIIIIIIIIIIIIIIIIIIIIIMIIIMIIIMIIIIIIIIIIIIIIIMIIIIIIII 

CCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCTCCTGGACCA 618 
CAAGCACCTAGACCATGAGGTGGCCAAACCTGCCCGAAGAAAGAGGCTGCCCGAGATGGC 660 

II I II III I llllll II INI 1 1 MM IIIIIIIIIIIIMIIIIIIIIIIIIIINII 

CAAGC ACCT AGACCATGAGGTGGCC AAGCCTGCCCG AAGAAAGAGGCTGCCCGAGATGGC 678 
CCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACC 701 

IIIIIIIMIIIIIMIIIMIIIIIIIIIIIIIIIIIIII 

CCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACC 719 



Identities = 399/415 (96%), Positives = 399/415 (96%), Strand = Plus / Plus 



55 



60 



65 



Query: 


694 


Sbjct: 


828 


Query: 


754 


Sbjct: 


886 


Query: 


814 


Sbjct: 


946 


Query : 


874 


Sbjct: 


1006 


Query : 


934 



III II lllllllllllllllllllllllllllllllllllllllll llllllllll 



CCGGCCCTGGGAA^CCTACAAGGCGGGGTTTGGGGATCCCCACGGCGAGTTCTGGCTGGG 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIII 
CCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCGAGTTCTGGCTGGG 



813 



945 



873 



TCTGGAGAAGGTGCATAGCATCACGGGGGACCGCAACAGCCGCCTGGCCGTGCAGCTGCG 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
TCTGGAGAAGGTGCATAGCATCACGGGGGACCGCAACAGCCGCCTGGCCGTGCAGCTGCG 1005 



GGACTGGGATGGCAACGCCGAGTTGCTGCAGTTCTCCGTGCACCTGGGTGGCGAGGACAC 

IIIIIIIIIMIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

GGACTGGGATGGCAACGCCGAGTTGCTGCAGTTCTCCGTGCACCTGGGTGGCGAGGACAC 



126 



933 



1065 



Cii Hi 'mi-s iir>. s „ -O u if ri'f i 



10 



15 



20 



25 



30 



35 



40 



Sbjct : 


1066 


Query: 


994 


Sbjct : 


1126 


Query: 


1054 


Sbjct: 


1186 


Score 


= 936 


Identities 


Query : 


909 


Sb j C t : - 


993 


Query : 


969 


Sbjct: 


1044 


Query : 


1025 


Sbjct: 


1103 


Query: 


1083 


Sbjct: 


1155 


Query: 


1136 


Sbjct: 


1214 


Query : 


1196 


Sbjct: 


1274 


Query: 


1256 


Sbjct: 


1334 



IIIIIIIIIIIIIMIIIIIIIIIIIIIIIMIIIIIIIIIIMIIIIIIIIMIIIIII 

GGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAGCTGGGCGCCACCACCGTCCC 1125 
ACCCAGCGGCCTCTCCGTACCCTTCTCCACTTGGGACCAGGATCACGACCTCCGCAGGGA 1053 

MIIIIIIIIIIMIIIIIMIIIIIIIMIIIIIIIIIIIIIIIIMIIMIMIIIII 

ACCCAGCGGCCTCTCCGTACCCTTCTCCACTTGGGACCAGGATCACGACCTCCGCAGGGA 1185 



IIIIIMIIIIIIIIIIIIMII 



I HIM I 



Mill 



-CCAT 1108 
MM 



(140.4 bits). Expect = 6.1e-190, 



Sum P(2) = 6.1e-190 
407 (76%), Strand = Plus / Plus 

968 



CCGTGCACCTGGGTGGCGAGGACACGGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCG 

MMIM MM I I III III I Ml I II I Mill I Ml 

CCGTGCAGCTGCGGGACTGGGAT — GGCA-AC-GCC-G- AGTTG-CTGCAGTTCT — CCG 1043 

GCCAGCTGGGCGCC - ACCAC-CGTCCCAC — CCAGCGGCCTCTCCGTACCCTTCTCCACT 1024 

M Mill I I I II II M i II II II II 11 MM I II 

TGCACCTGGGTGGCGAGGACACGGCCTATAGCCTGCAGC - TC ACTGCACCCGTGGCCGGC 1102 

TGGGACCAGGATC - ACGACC- TCCGCAGGGACAAGAACTGCGCCAAGAGCCTCTCTGCCC 1082 

I I II I II III III II I I II I II I III III II 
CAG— CTGGGCGCCACCACCGTCC-CACCCAGCGGC-CT-CTCCGT-ACCCT-TCT-CCA 1154 

CATCGGT GGCTCAAAGACCTGACCATGTTCCCT — CTCC -CCT - GACCCCGGCAGGA 1135. 

I I II M III HIM II I I II III II II I III 

CTTGGGACCAGGATCAC -GACCTCCGCAGGGACAAGAACTGCGCCAAGAGCCTCTCTGGA 1213 
GGCTGGTGGTTTGGCACCTGCAGCCATTCCAACCTCAACGGCCAGTACTTCCGCTCCATC 1195 

IIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIII I 

GGCTGGTGGTTTGGCACCTGCAGCCATTCCAACCTCAACGGCCAGTACTTCCGCTCCATC 1273 
CCACAGCAGCGGCAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTAC 1255 

IIIIIMIIMMIIIIIIIIIIIMIIIIIIIIMIIIMI IMIIIIIIIIIIMIII 

CCACAGCAGCGGCAGAAGCTTAAGAAGGGAATCTTCTGGAAGACCTGGCGGGGCCGCTAC 1333 
TACCCGCTGCAGGCC ACC ACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAG 1315 

IIIIIIIIIIIIIMIIIIIIIMIIIMIIIIIIIIIIMIIIIIIIIIIIIIIIIIM 

TACCCGCTGCAGGCCACCACCATGTTGATCCAGCCCATGGCAGCAGAGGCAGCCTCCTAG 1393 



Table 30. BLASTP search using the protein of CuraGen Acc. No. CG57051-02. 

>ptnr : SPTREMBL-ACC : Q9NZU4 HEPATIC ANGIOPOIETIN-RELATED PROTEIN - Homo sapiens 
45 (Human) , 406 aa. (Seq id no:95) 

Length = 406 



50 



55 



60 



65 



Score = 919 (323.5 bits). Expect = 4.9e-194, Sum P(3) 
Identities = 179/182 (98%), Positives = 180/182 (98%) 



4.9e-194 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query : 
Sbjct : 



1 MSGAPTAGAALMLCAATAVLLSARSGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 

IMMMMMMMMMMIh I II 1 1 II I II II II M I II II M I II II II II 1 1 1 

1 MSGAPTAGAALMLCAATAVLLSAQGGPVQSKSPRFASWDEM^FVLAHGLLQLGQGLREHAE 



61 



60 



60 



120 



RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 

MMIIII MMMMMMMMMIMIMIM MMIM MUM IIIIIIMIIM 

61 RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 120 



121 



180 



HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 

IMIIIIIIIIIIMIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIMIIII Mill 

121 HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPPHNVSR 180 



181 LH 182 
LH 

181 LH 182 



127 



Score = 670 (235.9 bits). Expect = 4.9e-194, Sum P(3) = 4.9e-194 
Identities = 123/132 (93%), Positives = 124/132 (93%) 

NVSRLHHGGWTVIQRRHDGSMDFNRPWEAYKAGFGDPHGEFWLGLEKVHSITGDRNSRLA 236 

I llllllllllllhlllllillllllllllllllMIIIIIIII llllllll 

NCKMTSDGGWTOIQRRHDGSVDFNRPWEAYKAGFGDPHGEFWLGLEKVHSIMGDRNSRLA 274 

VQLRDWDGNAELLQFSVHLGGEDTAYSLQLTAPVAGQLGATTVPPSGLSVPFSTWDQDHD 296 

10 ^ ~ lllll llllllllllllllll llllllll MM lllilllllllllMIIIIIIIIIII 

VQLRDWDGNAELLQFSVHLGGEDTAYSLQFTAPVAGQLGATTVPPSGLSVPFSTWDQDHD 334 



55 



60 



Query: 


177 


Sbjct: 


215 


Query : 


237 


Sbjct: 


275 


Query: 


297 


Sbjct: 


335 


Score 


= 331 


Identities 


Query: 


326 


Sbjct : 


346 


Query : 


386 


Sbjct: 


406 


Score 


= 46 


Identities 


Query : 


255 


Sbjct: 


1 


Score 


= 45 


Identities : 


Query : 


1 


Sbjct: 


293 



illlllllllll 



4.9e-194, Sum P(3) = 4.9e-194 
3 = 60/61 (98%) 



'lllllilllllllllllllMIIIMIIIIIIIIIIIIIIII llllllllllllllll 



15 



20 



25 

Sbjct: 406 S 406 
Score =46 (16.2 bits). Expect - 5.9e- 

19/40 (47%) 

30 

Query: 255 

+ I II +1 I I I I I I I K+IK 

MSGAPTAGAALMLCAATAVLLSAQGGPVQSKSPRFASWDE 
35 Score = 45 (15.8 bits). Expect = 7.6e-33, Sum P(2) = 7.6e-33 



+ I II H I I I I I I K^ll + 

40 

Table 31. BLAS.TN identity search of CuraGen Corporation's Human SeqCalling 
database using CuraGen Acc. No. CG57051-02. 

>s3aq:162377751 Category D: , 1920 bp. (SEQ id NO: 96) 
45 Length = 1920 

Minus Strand HSPs : 

Score = 3448 (517.3 bits). Expect = 1.5e-233, Sum P(2) = 1.5e-233 
50 Identities = 696/700 (99%), Positives = 696/700 (99%), Strand = Minus / Plus 

GGTGCAGGCGGCTGACATTGTGAGCCGGGTCAACTGGCTGGGCCATCTCGGGCAGCCTCT 642 

IlilllllllllllllllllllllllllllllllllllllllllllllllllMIIIIII 

GGTGC AGGCGGCTGACATTGTGAGCCCjGGTC AACTGGCTGGGCCATCTCGGGCAGCCTCT 1280 



Identities 


Query : 


701 


Sbjct: 


1221 


Query: 


641 


Sbjct: 


1281 


Query : 


581 


Sbjct: 


1341 



TTCTTCGGGCAGGTTTGGCCACCTCATGGTCTAGGTGCTTGTGGTCCAGGAGGCCA/y\CT 582 

lllllllllllll llllllllllllllllillllllllllillllllllllllllllll 
TTCTTCGGGCAGGCTTGGCCACCTCATGGTCTAGGTGCTTGTGGTCCAGGAGGCCAAACT 1340 



lllllllllllllllllllllllllllllllllllllllllllllllllllillll 
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CCTTGTGGAAGAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGT 462 

lllllllllillllllllllllllllllllllllllllllllllllllllllllllllll 
CCTTGTGGAAGAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGT 1460 

GAAGGACCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGG 402 

lillllllllllllllllllllllllllllllllllllllllllllllllllllilllll 
GAAGGACCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGG 1520 

TTCCCTGACAGGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGC 342 

llilllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
TTCCCTGACAGGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGC 1580 

GGGTGCGCTCCGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGA 282 

IIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 
GGGTGCGCTCCGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGA 1640 

CATTCATCTCGTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTCCAGATCT -A 223 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMII I II I 
CATTC ATCTCGTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTCC -GCCCTGA 1699 

GCGCTCAGTAGCACGGCGGTGGCGGCGCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCA 163 

IIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIlllllllillllllllllllll 
GCGCTCAGTAGCACGGCGGTGGCGGCGCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCA 1759 

CCGCTCATCCTCTTAGGTAGCCTGGGAGCGGGGATTCGGGGACTCTCGGGGACGTTGGGG 103 

llllllllllllllllllllllllll MIMIIIIIIMIIMIIIIIIIIIIIIIIIII 

CCGCTCATCCTCTTAGGTAGCCTGGGAGCGGGGATTCGGGGACTCTCGGGGACGTTGGGG 1819 

TTCCAGGTGCGAGGACTGGAGACGCGGAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAA 43 

llllllllllllllllllllllllllllllllllllilllllllllllllllllllllll 

TTCCAGGTGCGAGGACTGGAGACGCGGAGGACCGGGGGTAAGACCCGCTTGGTTGCAGAA 1879 



IIIIIIIIIIIMIIIIIIIIIIMIIIIIIIIIIII 



33.1 bits). Expect = 1.5e-233, Sum P(2) = 1.5e-233 (sbq id no: 130) 
Identities = 399/415 (96%), Positives = 399/415 (96%), Strand = Minus / Plus ' 

ATGG- T- CAGGTCTTTGAGCCACCGATGGGGCAGAGAGGCTCTTGGCGCAGTTCTTGTCC 1051 

nil I mil I Mill I iiiiiiiiiiiiiiiiiiiiiiiiiiiii 

ATGGCTGCAGGTGCCAAA-CCACC-AGCCTCCAGAGAGGCTCTTGGCGCAGTTCTTGTCC 



V"^-*- jr * 


521 


Sbjct : 


1401 


Queiry : 


461 


Sbjct: 


1461 


Query: 


401 


Sbjct : 


1521 


Query: 


341 


Sbjct : 


1581 


Query : 


281 


Sbjct : 


1641 


Query : 


222 


Sbjct : 


1700 1 


Query : 


162 1 


Sbjct: 


1760 < 


Query: 


102 ' 


Sbjct: 


1820 ' 


Query : 


42 < 


Sbjct: 


1880 ( 


Score 


= 1887 



Query : 


1108 


Sbjct : 


700 


Query: 


1050 


Sbjct: 


758 


Query : 


990 


Sbjct: 


818 


Query : 


930 


Sbjct: 


878 


Query : 


870 


Sbjct: 


938 


Queary: 


810 


Sbjct: 


998 


Query : 


750 


Sbjct: 


1058 


Score 


= 936 



CTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAAGGGTACGGAGAGGCCGCTGGGTGGG 

lllllllllllllllillll llllll IIIIIIIIIIIIIIIIIIIIIIIIIIIII Mill 
CTGCGGAGGTCGTGATCCTGGTCCCAAGTGGAGAAGGGTACGGAGAGGCCGCTGGGTGGG 



757 



991 



817 



931 



ACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGCAGTGAGCTGCAGGCTATAGGCCGTG 

IIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

ACGGTGGTGGCGCCCAGCTGGCCGGCCACGGGTGCAGTGAGCTGCAGGCTATAGGCCGTG 877 



TCCTCGCCACCCAGGTGCACGGAGAACTGCAGCAACTCGGCGTTGCCATCCCAGTCCCGC 

illllllllllllllllllllllllllllllllllllillllllllllllllllllllll 
TCCTCGCCACCCAGGTGCACGGAGAACTGCAGCAACTCGGCGTTGCCATCCCAGTCCCGC 

AGCTGCACGGCCAGGCGGCTGTTGCGGTCCCCCGTGATGCTATGCACCTTCTCCAGACCC 

IIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIilllllllllllllllllllll 
AGCTGCACGGCCAGGCGGCTGTTGCGGTCCCCCGTGATGCTATGCACCTTCTCCAGACCC 



871 



937 



811 



997 



751 



AGCCAGAACTCGCCGTGGGGATCCCCAAACCCCGCCTTGTAGGCTTCCCAGGGCCGGTTG 

IIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 
AGCCAGAACTCGCCGTGGGGATCCCCAAACCCCGCCTTGTAGGCTTCCCAGGGCCGGTTG 1057 



Mill IIIMIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII II III 



(140.4 bits), Expect = l.le-190, 



Sum P(2) = l.le-190 

129 



CSEQ ZD NOsl31) 



10 



15 



20 



25 



30 
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Identities = 312/407 (76%), Positives = 312/407 (76%), Strand = Minus / Plus 

CTAGGAGGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTA 1256 

IIMIIIIIIIIIiillllllllllllllllllllllllMlllllllllllllilllll 

CTAGGAGGCTGCCTCTGCTGCCATGGGCTGGATCAACATGGTGGTGGCCTGCAGCGGGTA 606 

GTAGCGGCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGG 1196 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllli 

GTAGCGGCCCCGCCAGGTCTTCCAGAAGATTCCCTTCTTAAGCTTCTGCCGCTGCTGTGG 666 

GATGGAGCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCC 1136 

lllllllllllllillllllllllllllllllllllllllllllllllllllllllllll 
GATGGAGCGGAAGTACTGGCCGTTGAGGTTGGAATGGCTGCAGGTGCCAAACCACCAGCC 726 

TCCTGCCGGGGTCAGGG-G-AGAGG- -GAACATGGTCAGGTCTTTGAGCCA CCGATG 1083 

III I II II II I II 11 II Hill I II II II II 

TCCAGAGAGGCTCTTGGCGCAGTTCTTGTCCCTGCGGAGGTCGT-GATCCTGGTCCCAAG 785 



Query: 


1315 


Sbjct: 


547 


Query : 


1255 


Sbjct: 


607 


Query : 


1195 


Sbjct: 


667 


Query: 


1135 


Sbjct : 


727 


Query : 


1082 


Sbjct: 


786 


Query: 


1024 


Sbjct: 


838 


Query: 


968 


Sbjct: 


897 



II III III I II I II I I II III III II 



II I I 



AGTGGAGAAGGGTACGGAGAGGCCGCTGGGTG — GGACG-GTGGTGGCG- CCCAGCTGGC 969 

II I MM I I III I II II I II M II I II Mill II 

GCCGGCCACGGGTGCAGTGAG -CTGC AGGCTATAGGCCGTGTCCTCGCCACCC AGGTGCA 896 



II 



Mil I II I III I III I II III III lllllll 

'GCAGCAA-CT-C-GGCGTT — GCCATC-CCAGTCC-CGCAGCTGCACGG 947 



Table 32. ClustalW alignment of CG57051-02 protein with related proteins. 



CG57051_02 
Q9NZU4 



vl S GAP TAG A AL ML C A A T A VL L S A^G P V Q S KS P R FA S ^TO E M N V L AHG L L QL G Q GL RE H AE 
\l S GAP T A G A AL ML C A A T A VL L S A^G P V Q S KS P R FA S \TO E M NY L AHG L L QL G 6 GL RE H AE 



CG57051,,02 
Q9NZU4 



RTRSQL S ALERRLSACGSACQGTEGS TDLPLAPESRVDP E VLH S LQTQLKAQH S R-I OOL F 
RTRSQL S ALERRLSACGSACQGTEGSTDLPLAPESRVDPE VLHSLQTQ L KA Q N S R 1 Q Q L F 



CG57051_02 
Q9NZU4 



H KV AQQ Q R HL E KQHL R I OHL Q S Q FG L L D H ICHL DHE V A K P A R RK R L P E MAOP V D P 
H KV AQQ Q R HL E KQHL R I QHL Q S Q FG L L D H KHL DHE V A K P A R RK R L P E M AQP V D P 



CG57051J)2 
Q9NZU4 



LH 
LHRLPRD 



CL FQVG E RQS GLFE I QP QG S PP FL VHCKMT S 



:tGv.tviorrhdgs 



CG57051_02 
Q9NZU4 

CG57051J)2 
Q9NZU4 



iVEAYICAG FGDPHGEFV-TGLEK^S IfiGDRNSRLAVQL RDWD GN A ELL Q FS VHL GGEDTAY 
v^TE AYICA G F GDP HGE FVrLGLEK^^HS I [BgD R H S RL A VQL R D V/D GH A E L L Q FS VHL GGEDTAY 



S L Q Wr A P V AGQ L GAT T VP P S GL S VP F S T^/TOQDHDL RR D KN C AK S L S A P S V AQ R P D H V P S P 
S LqHtA P V AGQL GATT VPP S GLS VP F S TVTDQDHDLRRD Kl'T C AK S L S 



CG57051_02 
Q9NZU4 



LTPAGGVAVFGTC S HS N LNGQYFRS I P QQ RQICLOCG I FTOIT W.G R YYJL QATTML I QPMAA 
GGV.WFGTC SHS HLNGQYFRS I PQQ RQKLiCKG I FV.TCT V/RG R YYaL OATTML I QPMAA 



35 



CG57051_02 
Q9NZU4 



Information for the ClustalW proteins: 
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Accno Common Name Length 

CG5705 1_02 (SEQ id no:55) novel Angiopoietin-like protein 386 

Q9NZU4 (SEQ ID NO:95) HEPATIC ANGIOPOIETIN- 406 

RELATED PROTEIN. 

In the alignment shown above, black outlined amino acid residues indicate residues 
identically conserved between sequences (i.e., residues that may be required to preserve 
structural or functional properties); amino acid residues with a gray background are similar to 
one another between sequences, possessing comparable physical and/or chemical properties 
5 without altering protein structure or function (e.g. the group L,V, I, and M may be considered 
similar); and amino acid residues with a white background are neither conserved nor similar 
between sequences. 

Table 33. PSORT, Signal? and hydropathy results for CuraGen Acc, No. CG57051-02. 

endoplasmic reticulum (membrane) Certainty=0 . 82 00 (Affirmative ) < suco 

10 microbody (peroxisome) Certainty=0 . 3008 (Affirmative) < suco 

plasma membrane Certainty=0 . 1900 (Affirmative) < suco 

endoplasmic reticulum (lumen) Certainty^O . 1000 (Affirmative) < suco 



15 



20 



INTEGRAL Likelihood = -4.04 Transmembrane 7-23 (4-25) 

Seems to be a Type lb (Nexo Ccyt) membrane protein 
Is the sequence a signal peptide? 
# Measure Position Value Cutoff Conclusion 



max. 


C 


31 


0.427 


0, 


.37 


YES 


max. 


Y 


31 


0.473 


0. 


.34 


YES 


max. 


S 


8 


0.952 


0. 


.88 


YES 


mean 


S 


1-30 


0.738 


0, 


.48 


YES 



# Most likely cleavage site between pos. 30 and 31: VQS-KS 
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Hydropathy Plot plot for CG57051 62 with a windou of 19 
2 I 1 1 1 1 1 1 




SO 100 150 20a 250 300 350 400 

Amino Acid Number 



SECP18 

A SECP18 nucleic acid and polypeptide according to the invention includes the nucleic 
acid sequence (SEQ ID NO:56) and encoded polypeptide sequence (SEQ ID NO:57) of clone 

5 CG5705 1-03 directed toward novel Angiopoietin-like proteins and nucleic acids 

encoding them. Figure 23 illustrates the nucleic acid sequence and amino acid sequences 
respectively. This clone includes a nucleotide sequence (SEQ ID NO:56) of 11 50 bp. The 
nucleotide sequence includes an open reading frame (ORE) beginning with an ATG initiation 
codon at nucleotides 44-46 and ending with a TAG stop codon at nucleotides 1 148-1 150. 
10 Putative untranslated regions, if any, are found upstream from the initiation codon and 

downstream from the termination codon. The encoded protein having 368 amino acid residues is 
presented using the one-letter code in Figure 23. 

The protein encoded by clone CG57051-03 is predicted by the PSORT program to be 
located extracellularly with a certainty of 0.7332 and has a signal peptide (see Table 38 below). 
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The PGR product derived by exon linking, covering the entire open reading frame, was cloned 
into the pCR2. 1 vector from Invitrogen to provide clone 134276:: 130294: :PPAR- 
gamma.698782. P15. The DNA and protein sequences for the novel Angiopoietin-like gene are 
reported here as CuraGen Acc. No. CG5 705 1-03. 

3 Similarities 

In a search of sequence databases, it was found, for example, that the nucleic acid 
sequence of this invention has 837 of 1031 bases (81%) identical to a gb:GENBANK- 
ID:AF202636|acc:AF202636.1 mRNA from Homo sapiens (Homo sapiens angiopoietin-like 
protein FPl 158 mRNA, complete cds) (Table 34). The full amino acid sequence of the protein of 
10 the invention was found to have 184 of 192 amino acid residues (95%) identical to, and 184 of 
192 amino acid residues (95%) similar to, the 406 amino acid residue ptnriSPTREMBL- 
ACC:Q9HBV4 protein from Homo sapiens (Human) (ANGIOPOIETIN-LIKE PROTEIN 
PPl 158) (Table 35). 

A multiple sequence alignment is given in Table 37, with the protein of the invention 
15 being shown on the first line in a ClustalW analysis comparing the protein of the invention with 
related protein sequences. Please note this sequence represents a splice form of Angiopoietin as 
indicated in positions 183 to 221. 

The presence of identifiable domains in the protein disclosed herein was determined by 
searches versus domain databases such as Pfam, PROSITE, ProDom, Blocks or Prints and then 
20 identified by the Interpro domain accession number. Significant domains are summarized below: 

Hodel Domain seq-f seq-t hinm-f hitim-t score E-value 

f ibrinogen^C 1/2 184 246 47 123 102.6 2.2e-28 

fibrinogen_C 2/2 288 362 178 272 .] 61.3 1.4e-16 

IPR002181; (Fibrinogen_C) 

Fibrinogen, the principal protein of vertebrate blood clotting is an hexamer containing 

two sets of three different chains (alpha, beta, and gamma), linked to each other by disulfide 

25 bonds. The N-terminal sections of these three chains are evolutionary related and contain the 

cysteines that participate in the cross-linking of the chains. However, there is no similarity 

between the C-terminal part of the alpha chain and that of the beta and gamma chains. The C- 
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terminal part of the beta and gamma chains forms a domain of about 270 amino-acid residues. 
As shown in the schematic representation this domain contains four conserved cysteines 
involved in two disulfide bonds. 

5 (SEQIDNO:126) 

xjcxxCxxxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxxCxxxxxxxxxxx 

\ \ • II 

+ — + + — + 

10 'C: conserved cysteine involved in a disulfide bond. 

Such a domain has been recently found in other proteins which are listed below: 

1) Two sea cucumber fibrinogen-like proteins (FReP-A and nieP-B). These are proteins, 
15 of about 260 amino acids, which have a fibrinogen beta/gamma C-terminal domain. 

2) In the C-terminus of Drosophila protein scabrous (gene sea). Scabrous is involved in 
the regulation of neurogenesis in Drosophila and may encode a lateral inhibitor of R8 cells 
differentiation. 

3) In the C-terminus of a mammalian T-cell specific protein of unknown function. 

20 4) In the C-terminus of a human protein of unknown function which is encoded on the 

opposite strand of the steroid 21 -hydroxy lase/complement component C4 gene locus. 

The function of this domain is not yet known, but it has been suggested that it could be 
involved in protein-protein interactions. 

25 This indicates that the sequence of the invention has properties similar to those of other 

proteins known to contain this/these domain(s) and similar to the properties of these domains. 

Chromosomal information: 

The Angiopoietin-like gene disclosed in this invention maps to chromosome 19pl3.3. 
This assignment was made using mapping information associated with genomic clones, public 
30 genes and ESTs sharing sequence identity with the disclosed sequence and CuraGen 
Corporation's Electronic Northern bioinformatic tool. 
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Tissue expression 

The Angiopoietin-Iike gene disclosed in this invention is expressed in at least the 
following tissues: Adipose, Liver, Placenta. Expression information was derived from the tissue 
sources of the sequences that were included in the derivation of the sequence of CuraGen Acc. 
5 NO.CG57051-03. 

Cellular Localization and Sorting 

The PSORT, SignalP and hydropathy profile for the Angiopoietin-like protein are shown 
in Table 38. The results predict that this sequence has a signal peptide and is likely to be 
localized extracellularly with a certainty of 0.7332. The signal peptide is predicted by SignalP to 
10 be cleaved at amino acid 25 and 26: AQG-GP. 

Functional Variants and Homologs 

The novel nucleic acid of the invention encoding a Angiopoietin-like protein includes the 
nucleic acid whose sequence is provided in Figure 23, or a fragment thereof- The invention also 
includes a mutant or variant nucleic acid any of whose bases may be changed from the 

15 corresponding base shown in Figure 23 while still encoding a protein that maintains its 

Angiopoietin-like activities and physiological functions, or a fragment of such a nucleic acid. 
The invention further includes nucleic acids whose sequences are complementary to the 
sequence of CuraGen Acc. No. CG57051-03, including nucleic acid fragments that are 
complementary to any of the nucleic acids just described. The invention additionally includes 

20 nucleic acids or nucleic acid fragments, or complements thereto, whose structures include 

chemical modifications. Such modifications include, by way of non-limiting example, modified 
bases, and nucleic acids whose sugar phosphate backbones are modified or derivatized. These 
modifications are carried out at least in part to enhance the chemical stability of the modified 
nucleic acid, such that they may be used, for example, as antisense binding nucleic acids in 

25 therapeutic applications in a subject. In the mutant or variant nucleic acids, and their 
complements, up to about 19% of the bases may be so changed. 

The novel protein of the invention includes the Angiopoietin-like protein whose sequence 
is provided in Figure 23. The invention also includes a mutant or variant protein any of whose 
residues may be changed from the corresponding residue shown in Figure 23 while still encoding 
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a protein that maintains its Angiopoietin-like activities and physiological functions, or a 
functional fragment thereof. In the mutant or variant protein, up to about 5% of the amino acid 
residues may be so changed. 

Chimeric and Fusion Proteins 

5 The present invention includes chimeric or fusion proteins of the Angiopoietin-Hke 

protein, in which the Angiopoietin-like protein of the present invention is joined to a second 
polypeptide or protein that is not substantially homologous to the present novel protein. The 
second polypeptide can be fused to either the amino- terminus or carboxyl- terminus of the present 
CG5705 1-03 polypeptide. In certain embodiments a third nonhomologous polypeptide or protein 

10 may also be fused to the novel Angiopoietin-like protein such that the second nonhomologous 

polypeptide or protein is joined at the amino terminus, and the third nonhomologous polypeptide 
or protein is joined at the carboxyl terminus, of the CG57051-03 polypeptide. Examples of 
nonhomologous sequences that may be incorporated as either a second or third polypeptide or 
protein include glutathione S- transferase, a heterologous signal sequence fused at the amino 

15 terminus of the Angiopoietin-like protein, an immunoglobulin sequence or domain, a serum 
protein or domain thereof (such as a serum albumin), an antigenic epitope, and a specificity 
motif such as (His)6. 

The invention further includes nucleic acids encoding any of the chimeric or fusion 
proteins described in the preceding paragraph. 

20 Antibodies 

The invention further encompasses antibodies and antibody fragments, such as Fab, 
(Fab)2 or single chain FV constructs, that bind inununospecifically to any of the proteins of the 
invention. Also encompassed within the invention are peptides and polypeptides comprising 
sequences having high binding affinity for any of the proteins of the invention, including such 
25 peptides and polypeptides that are fused to any carrier particle (or biologically expressed on the 
surface of a carrier) such as a bacteriophage particle. 

Uses of the Compositions of the Invention 

The protein similarity information, expression pattern, cellular localization, and map 

location for the protein and nucleic acid disclosed herein suggest that this Angiopoietin-like 

136 



protein may have important structural and/or physiological functions characteristic of the 
Fibrinogen family. Therefore, the nucleic acids and proteins of the invention are useful in 
potential diagnostic and therapeutic applications and as a research tool. These include serving as 
a specific or selective nucleic acid or protein diagnostic and/or prognostic marker, wherein the 
5 presence or amount of the nucleic acid or the protein are to be assessed. These also include 
potential therapeutic applications such as the following: (i) a protein therapeutic, (ii) a small 
molecule drug target, (iii) an antibody target (therapeutic, diagnostic, drug targeting/cytotoxic 
antibody), (iv) a nucleic acid useful in gene therapy (gene delivery/gene ablation), (v) an agent 
promoting tissue regeneration in vitro and in vivo, and (vi) a biological defense weapon. 

10 The nucleic acids and proteins of the invention have applications in the diagnosis and/or 

treatment of various diseases and disorders. For example, the compositions of the present 
invention will have efficacy for the treatment of patients suffering from: type II diabetes, obesity, 
colon cancer, diabetes mellitus, insulin-resistant, with acanthosis nigricans and hypertension, 3- 
methylglutaconicaciduria, type III; Cone-rod retinal dystrophy-2;DNA ligase I deficiency; 

15 Glutaricaciduria, type IIB Liposarcoma; Myotonic dystrophy as well as other diseases, disorders 
and conditions. 

These materials are further useful in the generation of antibodies that bind 
immunospecifically to the novel substances of the invention for use in diagnostic and/or 
therapeutic methods. 

20 Table 34. BLASTN search using CuraGen Acc. No. CG57051-03. 

>gb:GENBANK-ID: AF2 02636 I acc:AF2 02636 . 1 Homo sapiens angiopoietin-like protein 
PP1158 mRNA, complete cds - Homo sapiens, 1943 bp. (Seq id no:97) 
Length = 1943 

25 

Plus strand HSPs: 
score. = 2967 (445.2 bits). Expect = 3.2e-128, P = 3.2e-128 

Identities = 837/1031 (81%), Positives = 837/1031 (81%), Strand = Plus / Plus 
Query: 1 CCCCGAGAGTCCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGAC 60 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIilll 

Sb j C t : 130 CCCCGAGAGTCCCCGAATCCCCGCTCCCAGGCTACCTAAGAGGATGAGCGGTGCTCCGAC 189 
35 -Query: 61 GGCCGGGGCAGCCCTGATGCTCTGCGCCGCCACCGCCGTGCTACTGAGCGCTCAGGGCGG 120 

llllllllllillllllllllllllMIIIIIIIIIIIIIIIIIMIIIIIIilllllll 

Sb j C t : 190 GGCCGGGGC AGCCCTGATGCTCTGCGCCGCC ACCGCCGTGCTACTGAGCGCTCAGGCSCGG 249 
Query: 121 ACCCGTGCAGTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCA 180 

40 I M 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 I I I 1 1 1 M 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 

Sb j C t : 250 ACCCGTGCAGTCCAAGTCGCCGCGCTTTGCGTCCTGGGACGAGATGAATGTCCTGGCGCA 309 
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Query: 



181 CGGACTCCTGCAGCTCGGCCAGGGGCTGCGCGAACACGC(3GAGCGCACCCGCAGTCAGCT 240 

137 



10 



15 



20 



25 



30 



35 



40 



45 



50 
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Sbjct : 


310 


Query : 


241 


Sbjct: 


370 


Query: 


301 


Sbjct: 


430 


Query : 


361 


Sbjct : 


490 


Query: 


421 


Sbjct: 


550 


Query : 


481 


Sbjct : 


610 


Query: 


541 


Sbjct: 


670 


Query: 


597 


Sbjct: 


730 


Query: 


653 


Sbjct : 


789 


Query: 


711 


Sbjct: 


848 


Query: 


771 


Sbjct: 


897 


Query : 


823 


Sbjct: 


955 


Query: 


879 


Sbjct: 


1015 


Query: 


939 


Sbjct: 


1066 


Query : 


996 


Sbjct: 


1121 



IIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIM 

CGGACTCCTGCAGCTCGGCCAGGGGCTGCGCGAACACGCGGAGCGCACCCGCAGTCAGCT 369 
GAGCGCGCTGGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCGAGGGGTC 300 

illlMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

GAGCGCGCTGGAGCGGCGCCTGAGCGCGTGCGGGTCCGCCTGTCAGGGAACCG AGGGGTC 429 
CACCGACCTCCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCA 3 60 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIMIIIIII 

CACCGACCTCCCGTTAGCCCCTGAGAGCCGGGTGGACCCTGAGGTCCTTCACAGCCTGCA 489 

GACACAACTCAAGGCTCAGAACAGCAGGATCCAGCAACTCTTCCACAAGGTGGCCCAGCA 420 

lllllllllllllllllllllllllllllllillllllllllllllllllllllllllll 

GAG AC AACTC AAGGCTCAGAACAGC AGGATCCAGCAACTCTTCCAC AAGGTGGCCCAGCA 549 

GCAGCGGCACCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCT 480 

lllllllllllllllllllllllllllllllllllllllillllllllllMIIIIIIII 

GCAGCGGCACCTGGAGAAGCAGCACCTGCGAATTCAGCATCTGCAAAGCCAGTTTGGCCT 609 

CCTGGACCACAAGCACCTAGACCATGAGGTGGCCAAGCCTGCCCGAAGAAAGAGGCTGCC 54 0 

llllllllllllllllllilllllllllllllllllllllllllllllllllllilllll 

CCTGGACCACAAGCACCTAGACCATGAGGTGGCCAAGCCTGCCCGAAGAAAGAGGCTGCC 669 

CGAGATGGCCCAGCCAGTTGACCCGGCTCACAATGTCAGCCGCCTGCACCA — TGG — AG 596 

llllllllllllllllllllllllllllllllllllllllllllllllll II II 

CGAGATGGCCCAGCCAGTTGACCCGGCTC ACAATGTCAGCCGCCTGCACCGGCTGCCCAG 729 
GO - TGGAC AGTAA- T - TO AGAGGC - GCCACGATGGCTC AGTGGACTTCAACCGGCCCTGG 652 

I I I III I I I Mi I II III llllllll II I 

GGATTGCCAGGAGCTGTTCCAGGTTGGGGAGA-GGCAGAGTGGACTATTTGAAATCCAGC 788 
GA-AGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCGAGTTCTGGCTGG-GTCTGGAGA 710 

II I II Hill III II I II II i III I Mill I 

CTCAGGGGTCTCCGCCATTTTTGGTGAACTGCAAGATGACCTCAGA-TGGAGGCTGGACA 847 

AGGTCCATAGCATCACGGGGGACCGCAACAGCCGCCTGGCCGTGCAGCTGCGGGACTGGG 770 

I I II II I II I M I I II III I I II I III Mill 

-G-TA-ATT-CAG-A — GGCG-CCACGATGGCTCAGTGGACTT-CAAC — CGGCCCTGGG 896 

ATG ACAACGCCGAGTTGCTGCAGTTCTC-CGTGC-AC- -CTGGGTGGCGA-GGACAC 822 

II MM II I III I I I I I II II I MM Ml Ul I 



I Mill III II III I I III MM II I III II 



II I II I I I I I I III II II III II I Ml II 

K^^C — GCCGAGT-TGC-TGCAGTTCTCCG — TGC-ACCTGGGTGGCGAGGACA-C- 



1065 



I I M Mil II 



I I I I III II II III II II II 

JCACC-C — GTGGCCGGCCAGCTGGGCGCCACCA-CC 1120 



I III II II HIM II I I M MM 



60 



Score = 2774 (416.2 bits). Expect = 1.6e-119, P = 1.6e-119 

Identities = 562/568 (98%), Positives = 562/568 (98%), Strand = Plus / Plus 
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Query : 
Sbjct: 
Query : 



583 CCTGCACCATGGAGGCTGGACAGTAATTCAGAGGCGCCACGATGGCTCAGTGG ACTTCAA 642 

III II llllllllllllllllllllllllllllllllllllllllllllllllllll 

828 CCT-CAG - ATGGAGGCTGGACAGTAATTCAGAGGCGCCACGATGGCTCAGTGGACTTCAA 885 



643 CCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCGAGTTCTGGCTGGG 
llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

138 



702 



.1 -O ill 'I ii".'S.ii':-i. £i. ... O 3:1 ^-ll T 



10 



15 



20 



25 



30 



Sbjct : 


886 


Query : 


703 


Sbjct : 


946 


Query : 


763 


Sbjct : 


1006 


Query : 


823 


Sbjct : 


1066 




883 


Sbjct : 


1126 


Ouerv • 
Vucx y • 


943 


Sbjct : 


1186 


Query: 


1003 


Sbjct: 


1246 


Query: 


1063 


Sbjct: 


1306 


Query: 


1123 


Sbjct: 


1366 



CCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCGAGTTCTGGCTGGG 945 

TCTGGAGAAGGTCCATAGCATCACGGGGGACCGCAACAGCCGCCTGGCCGTGCAGCTGCG 762 

llllllllllll IIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIMIIIIIII 

TCTGGAGAAGGTGCATAGCATCACGGGGGACCGCAACAGCCGCCTGGCCGTGCAGCTGCG 1005 

GGACTGGGATGACAACGCCGAGTTGCTGCAGTTCTCCGTGCACCTGGGTGGCGAGGACAC 822 

lllllllllll IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIM 

GGACTGGGATGGCAACGCCGAGTTGCTGCAGTTCTCCGTGCACCTGGGTGGCGAGGACAC 1065 



GGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAGCTGGGCGCCACCACCGTCCC 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 

GGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAGCTGGGCGCCACCACCGTCCC 



882 



1125 



942 



ACCCAGCGGCCTCTCCGTACCCTTCCCCACTTGGGACCAGGATCACGACCTCCGCAGGGA 

IIIIIIIIIMIIIIIIIIIIIIII IIIIIIIIIIMIIIIIIIIIMIIIIIIIIIII 

ACCCAGCGGCCTCTCCGTACCCTTCTCCACTTGGGACCAGGATCACGACCTCCGCAGGGA 1185 

CAAGAACTGCGCCAAGAGCCTCTCTGGAGGCTGGTGGTTTGGCACCTGCAGCCATTCCAA 1002 
i i i I I i i i I i I i i i i i I i i I i i I I i I i I i I I I i I I i M I I I ! I I 



j i M i j i 1 I i I i i M i i I j i M i i i i 

CAAGAACTGCGCCAAGAGCCTCTCTGGAGGCTGGTGGTTTGGCACCTGCAGCCATTCCAA 



1245 



CCTCAACGGC C AGTACTTCCGCTCCATCCCACAGCAGCGGCAGAAGCTTAAGAAGGGAAT 1062 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIII 

CCTCAACGGCCAGTACTTCCGCTCCATCCCACAGCAGCGGCAGAAGCTTAAGAAGGGAAT 1305 
CTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAGGCCACCACCATGTTGATCCA 1122 

iiiiiiiiiiiiiiiiiiiiiii llllllllllll 1 1 iiiiiii II liiiiiiiii II II 

CTTCTGGAAGACCTGGCGGGGCCGCTACTACCCGCTGCAGGCCACCACCATGTTGATCCA 1365 

GCCCATGGCAGCAGAGGCAGCCTCCTAG 1150 

llllllllllllllllllllllllllll 

GCCCATGGCAGCAGAGGCAGCCTCCTAG 1393 
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Table 35. BLASTP search using the protein of CuraGen Acc. No. CG57051-03. 



>ptnr:SPTREMBL-ACC:Q9HBV4 ANGIOPOIETIN-LIKE PROTEIN PP1158 - Homo sapiens 
(Hiiman) , 406 aa. (SEQ id NO: 98) 
40 Length = 406 

Score = 1009 (355.2 bits), Expect = 4.3e-198, Sum P(2) = 4.3e-198 
Identities = 184/192 (95%), Positives = 184/192 (95%) 



45 



50 



55 



60 



Query: 


177 


Sbjct: 


215 


Query: 


237 


Sbjct: 


275 


Query: 


297 


Sbjct: 


335 


Query: 


357 


Sbjct: 


395 


Score 


= 934 



WSRLHHGGWTVIQRRHDGSVDFNRPWEAYKAGFGDPHGEFWLGLEKVHS ITGDRNSRLA 236 

I llllllllllllllllllllllllllllllllllllllllllillllllllll 

NCKMTSIX3GWTVIQRRHDGSVDFNRPWEAYKAGFGDPHGEFWLGLEKVHS ITGDRNSRLA 274 



296 



lllllll IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIII lllllll 
VQLRDWDGNAELLQFSVHLGGEDTAYSLQLTAPVAGQLGATTVPPSGLSVPFSTWDQDHD 



334 



LRRDKNCAKSLSGGWWFGTCSHSNLNGQYFRSI PQQRQKLKKGIFWKTWRGRYYPLQATT 3 56 

llllllllllll lllllllllll lllllllllll lllllll I III Mill II II Mill I 
LRRDKNCAKSLSGGWWFGTCSHSNLNGQYFRSIPQQRQKLKKGIFWKTWRGRYYPLQATT 394 



MIMIIIIIII 



28.8 bits). Expect = 4.3e-198, Sum P(2) = 4.3e-198 
Identities = 182/182 (100%), Positives = 182/182 (100%) 



Query: 1 MSGAPTAGAALMLCAATAVLLSAQGGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 60 

65 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I I 1 1 1 1 1 1 I I 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 
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10 



15 



Sbjct: 
Query : 
Sbjct: 
Query: 
Sbjct: 
Query : 
Sbjct: 



1 MSGAPTAGAALMLCAATAVLLSAQGGPVQSKSPRFASWDEMNVLAHGLLQLGQGLREHAE 60 

61 RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 120 

IIIIIIMIIIIIIIIIIIIIMIIMIIIIIIIIIIIIIIMIIIIIIIIIIIIIIMI 

61 RTRSQLSALERRLSACGSACQGTEGSTDLPLAPESRVDPEVLHSLQTQLKAQNSRIQQLF 120 

121 HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 180 

IIIIIIIIIIIIIIIIIIIIIIMIIIinillllllllllMillllllllllllMII 

121 HKVAQQQRHLEKQHLRIQHLQSQFGLLDHKHLDHEVAKPARRKRLPEMAQPVDPAHNVSR 180 

181 LH 182 
II 

181 LH 182 



Table 36. BLASTN identity search of CuraGen Corporation's Human SeqCalling 
database using CuraGen Acc. No. CG57051-03. 
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>s3aq: 189266374 Sequence 5 from Patent WO0105825 (AX079971.1: 100%/409, 
p=1.2e-238), 5 50 bp. (SEQ ID NO: 99) 
Length = 550 

Plus Strand HSPs : 

Score = 2723 (408.6 bits), Expect = 1.8e-117, P = 1.8e-117 

Identities = 547/550 (99%), Positives = 547/550 (99%), Strand = Plus / Plus 

GAATTC AGCATCTGC AAAGCC AGTTTGGCCTCCTGGACC AC AAGC ACCTAGACCATGAGG 509 

lllllljlllMIIIIIIIIMIIIIIIIIIIIIIIIIIIII MIIIIIIIIIIIIIMI 

GAATTCAGCATCTGC AAAGCCAGTTTGGCCTCCTGGACCACAAGCACCTAGACCATGAGG 6 0 

TGGCCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGCCCAGCCAGTTGACCCGGCTC 569 

IIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIII 

TGGCCAAGCCTGCCCGAAGAAAGAGGCTGCCCGAGATGGCCCAGCCAGTTGACCCGGCTC 120 
ACAATGTCAGCCGCCTGCACCATGGAGGCTGGACAGTAATTCAGAGGCGCCACGATGGCT 629 

IIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIII.IIIIIIIIIIIIIIIIIIIIIII 

ACAATGTCAGCCGCCTGCACCATGGAGGCTGGACAGTAATTCAGAGGCGCCACGATGGCT 180 
CAGTGGACTTCAACCGGCCCTGGGAAGCCTACAAGGCGGGGTTTGGGGATCCCCACGGCG 689 

iiiiMiiiiiM iiiiiiiriiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 

CAGTGGACTTCAACCGGCCCTGGGAAGCCTACAAG<3CGGGGTTTGGGGATCCCCACGGCG 240 
AGTTCTGGCTGGGTCTGGAGAAGGTCCATAGCATCACGGGGGACCGCAACAGCCGCCTGG 749 

IIIIIIIIIIMIIIIIIIIIIIII llllllllllllllllllllllllllllllllll 

AGTTCTGGCTGGGTCTGGAGAAGGTGCATAGCATCACGGGGGACCGC AACAGCCGCCTGG 300 
CCGTGCAGCTGCGGGACTGGGATGACAACGCCGAGTTGCTGCAGTTCTCCGTGCACCTGG 809 

IIIIMIlllllllllllllllll lllllllllllllllllllllllllllllllllll 

CCGTGCAGCTGCGGGACTGGGATGGCAACGCCGAGTTGCTGCAGTTCTCCGTGCACCTGG 360 
GTGGCGAGGACACGGCCTATAGCCTGCAGCTCACTGCACCCGTGGCCGGCCAGCTGGGCG 869 

IIIIIIIIIIIIIIIIIIIIIIMIMIIIIIIIIIIMIIIIIIMIIIIMIIIIIII 

GTGGCGAGGACACGGCCTATAGCCTGC AGCTC ACTGC ACCCGTGGCCGGCCAGCTGGGCG 420 
CCACCACCGTCCC ACCCAGCGGCCTCTCCGTACCCTTCCCCACTTGGGACCAGGATCACG 929 

IIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIII lllllllllllllllllllll 

CCACCACCGTCCC ACCCAGCGGCCTCTCCGTACCCTTCTCCACTTGGGACCAGGATCACG 480 
ACCTCCGCAGGGACAAGAACTGCGCCAAGAGCCTCTCTGGAGGCTGGTGGTTTGGCACCT 989 

MIIIIIIIIIIIMIIIIIinilMIIIMIMIIIIIMIMIIIIIIIIIMIIII 

ACCTCCGCAGGGACAAGAACTGCGCCAAGAGCCTCTCTGGAGGCTGGTGGTTTGGCACCT 540 
990 GCAGCCATTC 999 



Query: 


450 


Sbjct: 


1 


Query: 


510 


Sbjct: 


61 


Query: 


570 


Sbjct: 


121 


Query: 


630 


Sbjct: 


181 


Query: 


690 


Sbjct: 


241 


Query: 


750 


Sbjct: 


301 


Query: 


810 


Sbjct: 


361 


Queiry : 


870 


Sbjct: 


421 


Query: 


930 


Sbjct: 


481 


Query: 


990 
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Sbjct : 



Illlllllll 

541 GCAGCCATTC 550 



>s3aq: 188990257 Homo sapiens angiopoietin-related protein mRNA, complete cds 
(AF153606.1: 99%/475, p=1.9e-259), 652 bp. (SEQ ID NO:100) 
Length =652 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



Minus Strand HSPs: 

Score = 2403 (360.5 bits). Expect = 4.2e-103, P = 4.2e-103 
Identities = 505/523 (96%), Positives = 505/523 (96%), Strand 



Minus / Plus 



Query : 


520 


Sbjct: 


128 


Query : 


464 


Sbjct : 


186 


Query : 


404 


Sbjct: 


246 


Query: 


344 


Sbjct : 


306 


Query : 


284 


Sbjct : 


366 


Query: 


224 


Sbjct: 


426 


Query: 


164 


Sbjct : 


486 


Query: 


104 


Sbjct: 


546 


Query: 


44 


Sbjct: 


606 



AGGCTTGGCCACC-TCATGGTCTAGGTG-CTT-GTGGTCCAG-GAGGCCAAACTGGCTTT 

II I III I II III I II I III II M III lllllll llllilll 

AGCCCTGGTCCCCGTCA- G - TCAATGTGACTGAGTCCGCCATTGAGGCCAGTCTGGCTTT 
GCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGT 

lllllllllllllllllllllllllllllllllllllllllllillllllilllllllll 

GCAGATGCTGAATTCGCAGGTGCTGCTTCTCCAGGTGCCGCTGCTGCTGGGCCACCTTGT 
GGAAGAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGA 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 

GGAAGAGTTGCTGGATCCTGCTGTTCTGAGCCTTGAGTTGTGTCTGCAGGCTGTGAAGGA 
CCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCT 

IIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

CCTCAGGGTCCACCCGGCTCTCAGGGGCTAACGGGAGGTCGGTGGACCCCTCGGTTCCCT 

GACAGGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGC 

lllllll lllllil IIMI IIIIIIIIIIIIIIIIIIUIIII INI lllllllllllll 

GACAGGCGGACCCGCACGCGCTCAGGCGCCGCTCCAGCGCGCTCAGCTGACTGCGGGTGC 

GCTCCGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCA 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIM 

GCTCCGCGTGTTCGCGCAGCCCCTGGCCGAGCTGCAGGAGTCCGTGCGCCAGGACATTCA 

TCTCGTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTCCGCCCTGAGCGCTCA 

MIIIIIIIIIIIIMIIIIIIIIIIIIIIIllllMlllllllllllllllllllllll 

TCTCGTCCCAGGACGCAAAGCGCGGCGACTTGGACTGCACGGGTCCGCCCTGAGCGCTCA 

GTAGCACGGCGGTGGCGGCGCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCACCGCTCA 

llllllllllllllllilllllllllllllllllllllllllllllllllllllllllll 
GTAGCACGGCGGTGGCGGCGCAGAGCATCAGGGCTGCCCCGGCCGTCGGAGCACCGCTCA 



465 



185 



405 



245 



345 



305 



285 



365 



225 



425 



165 



485 



105 



545 



45 



605 



TCCTCTTAGGTAGCCTGGGAGCGGGGATTCGGGGACTCT-CGGGG 

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii mil 

TCCTCTTAGGTAGCCTGGGAGCGGGGATTCGGGGACTCTTCGGGG 



650 



>s3aq: 164987939 Category E: Homo sapiens angiopoietin-related protein mRNA, 

complete cds {AF153606.1: 100%/150, p=1.9e-084), 228 bp. (SEQ ID NO: 101) 
Length = 228 

Minus Strand HSPs: 

Score = 480 (72.0 bits). Expect = 2.7e-31, Sum P(2) = 2.7e-31 

Identities = 96/96 (100%), Positives = 96/96 (100%), Strand = Minus / Plus 



llllllllllllllllllllllllllllllllllllllllllllllllllllllllll 



Query : 


590 


Sbjct: 


133 


Query: 


530 


Sbjct: 


193 


Score 


= 410 



IIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIII 



2.7e-31, S\m P(2) = 2.7e-31 

141 



(SEQ ID IIOtl32) 



Identities = 86/91 (94%) , Positives 
Query 



86/91 (94%), Strand = Minus / Plus 
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681 GGATCCCCAAACCCCGCCTTGTAGGCTTCCCAGGGCCGGTTGAAGTCCACTGAGCCATCG 622 

llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll 
Sbjct: 1 GGATCCCCAAACCCCGCCTTGTAGGCTTCCCAGGGCCGGTTGAAGTCCACTGAGCCATCG 60 

Query: 621 TGGCGCCTCTGAATTACTGTCCAGCCTCCAT 591 

IIIIIIIIIIIIIIII IIIMI II I I 

Sbjct: 61 TGGCGCCTCTGAATTAATGTCCACTCTGCCT 91 



Table 37. ClustalW alignment of CG57051-03 protein with related proteins. 



CG57051-03 

Q9HBV4 

CG57051-02 


Ivl S GAP T. 
Ivl S GAP T. 

M.S_GAP_T_. 


AG.^LMLC 
AGAALMLC 
A_G..1ALMLC 


AATAVLLSAQGi 
AATAVLLSAQGi 
AATAVLLSAl^i 


GP VO S fC S P R FAS WD E M 
GP VQ S K S P R FAS WD E M 
GP VO S K S P R F AS WD E Ivl 


NVLAHGL LO 
NVLAHGLLO 
NVLAHGL LO 


LG 
LG 
LG 


QGLREHAE 
QGLREHAE 
OGLREHAE 


















CG57051-03 

Q9HBV4 

CG57051-02 


RTRSOL 
RTRSQL 
RTRSQL 


S ALERRLS 
S ALERRLS 
S ALERRLS 


ACGSACQGTEG 
ACGSACQOTEG 
ACGSACOGTEG 


STD L P LAP ES RVD P E V 
STD L P LAP ES RVD P E V 
STDLPLAPESRVDPEV 


LHS LQTQLK 
LHS LOTOLK 
LHS LOT 0 LK 


AO 
AQ 
AQ 


NSR IQQLF 
HSR idoLF 
NSR lOQLF 



CG57051-03 

Q9HBV4 

CG57051-02 

CG57051-03 

Q9HBV4 

CG57051.02 

CG57051-03 

Q9HBV4 

CG57051-02 

CG57051-03 

Q9HBV4 

CG57051-02 



HFCVAOQQ RHLEKQHLR IQHLQSQFGLLDHKHLDHE\rAKPA R RK R L P EMAQPVD P AHNVS R 
HKVAOOQ RHLEKQHLR I QHLOS OFGLLDH KHLDHEVAK PA RRK R LP EMAQPVDP AHNVS R 
HKVAOOQ RHLEKOHLR I OHLOSQFGLLDH KHLDHEVAK PA RRK R L P EMAQPVDP AHNV S R 



Jrlprd cqelfqvgerqsglfeiqpqgsppflvnckmts d 



H GGWTV I 0 rrhd g s vd f n r p 
ggwtvi orrhdgsvdfnrp 

HGGV7TVI ORRHDGSUDFNRP 



..VEAYKA G F GD P H G E F WLG L EKVH S I TGD R N S R LAVO L R DWD [flN A E L L Q F S VH L GG ED T A\ 
WEAYKA G F GD P H G E F WLG L EKVH S I TGD R N S R LAVO L R DWD GN A E L L Q F S VH L GG ED T AY 
VVEAYKAG FGDPHGEFV/LGLEKVHS I TG D R N S R LAVO L R DWD GN A E L L Q F S VH L GG ED T AY 



S LQLTAP VAGQLGATTVPPSGLSVPFjgTWDODHDLRRDKN CAK S LS 
S LO LTA P V AGO L GAT T VP PSGLSVPFS TWDODHD LRR D K N C AK S L S 
S L 0 L T A P V AG 0 L G AT T V P P S G L S V P F S T WD 0 D H D L R R D K N C AK S L S 



APSVAQRPDHVPSP 



CG57051-03 - - - - 

Q9HBV4 

CG57051-02 LTPA 



CG57051-03 

Q9HBV4 

CG57051-02 



GGmVFGTCSHSNLNGQYFRS IPOQRQKLKKGI FWKT WRG R YYP LQATTML I QPMA.^. 
GGW^^/FGTCSHS N LNGOYFRS I PQQRQKLKKG I FWKT WRG R YYP LQATTML I QPM^^. 
GGW/'FGTCSHSNLNGOYFRS IPOORQKLKKGI FWKT WRG R YYP LQATTML I QPIvIAi^. 
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Information for the ClustalW proteins: 



Accno 

CG5705 1 -03 (SEQ id N0:49) 
Q9HBV4 (SEQ roNO:80) 
CG5705 1 -02 (SEQ id N0:55) 



Common Name Length 

novel Angiopoietin-like protein 368 

ANGIOPOIETIN-LIKE PROTEIN PPl 158. 406 

Angiopoietin-like protein- isoform 2 386 



In the alignment shown above, black outlined amino acid residues indicate residues 
identically conserved between sequences (i.e., residues that may be required to preserve 

142 



structural or functional properties); amino acid residues with a gray background are similar to 
one another between sequences, possessing comparable physical and/or chemical properties 
without altering protein structure or function (e.g. the group L,V, I, and M may be considered 
similar); and amino acid residues with a white background are neither conserved nor similar 
5 between sequences. 

Table 38. PSORT, Signal? and hydropathy results for CuraGen Acc. No. CG57051-03. 



outside Certainty=0 . 7332 (Affirmative) < suco 

microbody (peroxisome) Certainty=0 . 2527 (Affirmative) < suco 

10 endoplasmic reticulum (membrane) Certainty=0 . 1000 (Affirmative) < suco 

endoplasmic reticuliim (lumen) Certainty=0 . 1000 (Affirmative) < suco 



Is the secjuence a signal peptide? 

15 # Measure Position Value Cutoff Conclusion 

max. C 31 0.306 0.37 NO 

max. Y 26 0.429 0.34 YES 

max. S 8 0.952 0.88 YES 

mean S 1-25 0.848 0.48 YES 

20 # Most likely cleavage site between pos. 25 and 26: AQG-GP 



0.5 



-0.5 



-1.5 




150 £80 £50 

Amino Acid Number 



400 
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CG5705 1-04 directed toward novel Angiopoie tin-like proteins and nucleic acids 
encoding them. Figure 20 illustrates the nucleic acid sequence and amino acid sequences 
respectively. This clone includes a nucleotide sequence (SEQ ID NO:50) of 937 bp. The 
nucleotide sequence includes an open reading frame (ORF) beginning with an ATG initiation 
5 codon at nucleotides 155-157 and ending with a TAG stop codon at nucleotides 881-883. 
Putative untranslated regions, if any, are found upstream from the initiation codon and 
downstream from the termination codon. The encoded protein having 242 amino acid residues is 
presented using the one-letter code in Figure 20. The protein encoded by clone CG57051-04 is 
predicted by the PSORT program to be located at the endoplasmic reticulum with a certainty of 
10 0.8200, and appears to be a signal protein (see Table 27 below). Bottom of Form 

SECP Nucleic Acids 

The novel nucleic acids of the invention include those that encode a SECP or SECP-like 
protein, or biologically-active portions thereof. The nucleic acids include nucleic acids encoding 
polypeptides that include the amino acid sequence of one or more of SEQ ID NO:l, 3, 5, 7, 9, 

15 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56. The encoded polypeptides can thus include, 
e.g., the amino acid sequences of SEQ ID NO: 1, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 
52, 54 and 56.1n some embodiments, a SECP polypeptide or protein, as disclosed herein, 
includes the product of a naturally-occurring polypeptide, precursor form, pro-protein, or mature 
form of the polypeptide. The naturally-occurring polypeptide, precursor, or pro-protein includes, 

20 e.g., the full-length gene product, encoded by the corresponding gene. The naturally-occurring 
polypeptide also includes the polypeptide, precursor or pro-protein encoded by an open reading 
frame (ORE) described herein. As used herein, the term "identical" residues corresponds to 
those residues in a comparison between two sequences where the equivalent nucleotide base or 
amino acid residue in an alignment of two sequences is the same residue. Residues are 

25 alternatively described as "similar" or "positive" when the comparisons between two sequences 
in an alignment show that residues in an equivalent position in a comparison are either the same 
amino acid residue or a conserved amino acid residue, as defined below. 

As used herein, a "mature" form of a polypeptide or protein disclosed in the present 
invention is the product of a naturally occurring polypeptide or precursor form or proprotein. 
30 The naturally occurring polypeptide, precursor or proprotein includes, by way of nonlimiting 

example, the full length gene product, encoded by the corresponding gene. Alternatively, it may 
be defined as the polypeptide, precursor or proprotein encoded by an open reading frame 
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described herein. The product "mature" form arises, again by way of nonlimiting example, as a 
result of one or more naturally occurring processing steps as they may take place within the cell, 
or host cell, in which the gene product arises. Examples of such processing steps leading to a 
"mature" form of a polypeptide or protein include the cleavage of the amino-terminal methionine 
5 residue encoded by the initiation codon of an open reading frame, or the proteolytic cleavage of a 
signal peptide or leader sequence. Thus, a mature form arising from a precursor polypeptide or 
protein that has residues 1 to N, where residue 1 is the amino-terminal methionine, would have 
residues 2 through N remaining after removal of the amino-terminal methionine. Alternatively, a 
mature form arising from a precursor polypeptide or protein having residues 1 to N, in which an 

10 amino-terminal signal sequence from residue 1 to residue M is cleaved, would have the residues 
from residue M+1 to residue N remaining. Further, as used herein, a "mature" form of a 
polypeptide or protein may arise from a step of post-translational modification other than a 
proteolytic cleavage event. Such additional processes include, by way of non-limiting example, 
glycosylation, myristoylation or phosphorylation. In general, a mature polypeptide or protein 

15 may result from the operation of only one of these processes, or a combination of any of them. 

In some embodinients, a nucleic acid encoding a polypeptide having the amino acid 
sequence of one or more of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18,41,43,45,47,49,51,53, 
55 and 57, includes the nucleic acid sequence of any of SEQ ID NO: 1, 3, 5, 7, 9, 1 1, 13, 15, 17, 
40, 42, 44, 46, 48, 50, 52, 54, and 56, or a fragment thereof. Additionally, the invention includes 

20 mutantorvariantnucleicacidsofanyof SEQIDNO:l,3,5,7, 9, 11, 13, 15, 17,40,42,44,46, 
48, 50, 52, 54 and 56, or a fragment thereof, any of whose bases may be changed from the 
disclosed sequence while still encoding a protein that maintains its SECP-like biological 
activities and physiological functions. The invention further includes the complement of the 
nucleic acid sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 

25 52, 54 and 56, including fragments, derivatives, analogs and homologs thereof. The invention 
additionally includes nucleic acids or nucleic acid fragments, or complements thereto, whose 
structures include chemical modifications. 

Also included are nucleic acid fragments sufficient for use as hybridization probes to 
identify SECP-encoding nucleic acids (e.g., SEC? mRNA) and fragments for use as polymerase 
30 chain reaction (PGR) primers for the amplification or mutation of SECP nucleic acid molecules. 
As used herein, the term "nucleic acid molecule" is intended to include DNA molecules (e.g., 
cDNA or genomic DNA), RNA molecules (e,g., mRNA), analogs of the DNA or RNA generated 
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using nucleotide analogs, and derivatives, fragments, and homologs thereof. The nucleic acid 
molecule can be single-stranded or double-stranded, but preferably is double-stranded DNA. 

The term "probes" refer to nucleic acid sequences of variable length, preferably between 
at least about 10 nucleotides (nt), 100 nt, or as many as about, e.g., 6,000 nt, depending upon the 
5 specific use. Probes are used in the detection of identical, similar, or complementary nucleic 

acid sequences. Longer length probes are usually obtained from a natural or recombinant source, 
are highly specific and much slower to hybridize than oligomers. Probes may be single- or 
double-stranded, and may also be designed to have specificity in PGR, membrane-based 
hybridization technologies, or ELISA-like technologies. 

10 The tern "isolated" nucleic acid molecule is a nucleic acid that is separated from other 

nucleic acid molecules that are present in the natural source of the nucleic acid. Examples of 
isolated nucleic acid molecules include, but are not limited to, recombinant DNA molecules 
contained in a vector, recombinant DNA molecules maintained in a heterologous host cell, 
partially or substantially purified nucleic acid molecules, and synthetic DNA or RNA molecules. 

15 Preferably, an "isolated" nucleic acid is free of sequences which naturally flank the nucleic acid 
(i.e., sequences located at the 5'- and 3'-termini of the nucleic acid) in the genomic DNA of the 
organism from which the nucleic acid is derived. For example, in various embodiments, the 
isolated SECP nucleic acid molecule can contain less than aprroximately 50 kb, 25 kb, 5 kb, 
4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb of nucleotide sequences which naturally flank the nucleic 

20 acid molecule in genomic DNA of the cell from which the nucleic acid is derived. Moreover, an 
"isolated" nucleic acid molecule, such as a cDNA molecule, can be substantially free of other 
cellular material or culture medium when produced by recombinant techniques, or of chemical 
precursors or other chemicals when chemically synthesized. 

A nucleic acid molecule of the invention, e.g., a nucleic acid molecule having the 
25 nucleotide sequence of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 
56, or a complement of any of these nucleotide sequences, can be isolated using standard 
molecular biology techniques and the sequence information provided ^herein. Using all or a 
portion of the nucleic acid sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 
46, 48, 50, 52, 54 and 56 as a hybridization probe, SECP nucleic acid sequences can be isolated 
30 using standard hybridization and cloning techniques (e.g., as described in Sambrook et aL, eds.. 
Molecular Cloning: A Laboratory Manual 2"^ Ed., Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, NY, 1989; and Ausubel, et al, eds.. Current Protocols in Molecular 
Biology, John Wiley & Sons, New York, NY, 1993.) 
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A nucleic acid of the invention can be amplified using cDNA, mRNA or alternatively, 
genomic DNA, as a template and appropriate oligonucleotide primers according to standard PGR 
amplification techniques. The nucleic acid so amplified can be cloned into an appropriate vector 
and characterized by DNA sequence analysis. Furthermore, oligonucleotides corresponding to 
5 SECP nucleotide sequences can be prepared by standard synthetic techniques, e.g., using an 
automated DNA synthesizer. 

As used herein, the term "oligonucleotide" refers to a series of linked nucleotide residues, 
which oligonucleotide has a sufficient number of nucleotide bases to be used in a PGR reaction. 
A short oligonucleotide sequence may be based on, or designed from, a genomic or cDNA 

iO sequence and is used to amplify, confirm, or reveal the presence of an identical, similar or 

complementary DNA or RNA in a particular cell or tissue. Oligonucleotides comprise portions 
of a nucleic acid sequence having about 10 nt, 50 nt, or 100 nt in length, preferably about 15 nt 
to 30 nt in length. In one embodiment, an oligonucleotide comprising a nucleic acid.molecule 
less than 100 nt in length would further comprise at lease 6 contiguous nucleotides of any of 

15 SEQ ID NO: 1, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, or a complement 
thereof. Oligonucleotides may be chemically synthesized and may also be used as probes. 

In another embodiment, an isolated nucleic acid molecule of the invention comprises a 
nucleic acid molecule that is a complement of the nucleotide sequence shown in any of SEQ ID 
NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56. In still another 

20 embodiment, an isolated nucleic acid molecule of the invention comprises a nucleic acid 

molecule that is a complement of the nucleotide sequence shown in any of SEQ ID NO: 1, 3, 5, 7, 
9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, or a portion of this nucleotide sequence. 
A nucleic acid molecule that is complementary to the nucleotide sequence shown in is one that 
is sufficiently complementary to the nucleotide sequence shown in of any of SEQ ID NO:l, 3, 5, 

25 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 that it can hydrogen bond with Httle or 
no mismatches to the nucleotide sequence shown in of any of SEQ ID NO: 1, 3, 5, 7, 9, 1 1, 13, 
15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, thereby forming a stable duplex. 

As used herein, the term "complementary" refers to Watson-Crick or Hoogsteen base- 
pairing between nucleotides units of a iiucleic acid molecule, whereas the term "binding" is 
30 defined as the physical or chemical interaction between two polypeptides or compounds or 
associated polypeptides or compounds or combinations thereof. Binding includes ionic, 
non-ionic. Von der Waals, hydrophobic interactions, and the like. A physical interaction can be 

either direct or indirect. Indirect interactions may be through or due to the effects of another 
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polypeptide or compound. Direct binding refers to interactions that do not take place through, or 
due to, the effect of another polypeptide or compound, but instead are without other substantial 
chemical intermediates. 

Additionally, the nucleic acid molecule of the invention can comprise only a portion of 
5 the nucleic acid sequence of any of SEQ ID NO: 1, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 
50, 52, 54, and 56„ e.g., a fragment that can be used as a probe or primer, or a fragment encoding 
a biologically active portion of SECP. Fragments provided herein are defined as sequences of at 
least 6 (contiguous) nucleic acids or at least 4 (contiguous) amino acids, a length sufficient to 
allow for specific hybridization in the case of nucleic acids or for specific recognition of an 

10 epitope in the case of amino acids, respectively, and are at most some portion less than a full 
length sequence. Fragments may be derived from any contiguous portion of a nucleic acid or 
amino acid sequence of choice. Derivatives are nucleic acid sequences or amino acid sequences 
formed from the native compounds either direcdy or by modification or partial substitution. 
Analogs are nucleic acid sequences or amino acid sequences that have a structure similar to,, but 

15 not identical to, the native compound but differs from it in respect to certain components or side 
chains. Analogs may be synthetic or from a different evolutionary origin and may have a similar 
or opposite metabolic activity compared to wild-type. 

Derivatives and analogs may be full-length or other than full-length, if the derivative or 
analog contains a modified nucleic acid or amino acid, as described below. Derivatives or 

20 analogs of the nucleic acids or proteins of the invention include, but are not limited to, molecules 
comprising regions that are substantially homologous to the nucleic acids or proteins of the 
invention, in various embodiments, by at least about 70%, 80%, 85%, 90%, 95%, 98%, or even 
99% identity (with a preferred identity of 80-99%) over a nucleic acid or amino acid sequence of 
identical size or when compared to an aligned sequence in which the alignment is done by a 

25 computer homology program known in the art, or whose encoding nucleic acid is capable of 
hybridizing to the complement of a sequence encoding the aforementioned proteins under 
stringent, moderately stringent, or low stringent conditions. See e.g. Ausubel, et al., CURRENT 
Protocols in Molecular Biology, John Wiley & Sons, New York, NY, 1993, and below. 
An exemplary program is the Gap program (Wisconsin Sequence Analysis Package, Version 8 

30 for UNIX, Genetics Computer Group, University Research Park, Madison, WI) using the default 
settings, which uses the algorithm of Smith and Waterman (Adv. Appl. Math., 1981, 2: 482- 
489), which is incorporated herein by reference in its entirety. 



148 



J iilli 'fay H 5j' ii'ib «. ii"lL:3.ir'';r''' iKli 



The tern "homologous nucleic acid sequence" or "homologous amino acid sequence," or 
variations thereof, refer to sequences characterized by a homology at the nucleotide level or 
amino acid level as previously discussed. Homologous nucleotide sequences encode those 
sequences coding for isoforms of SECP polypeptide. Isoforms can be expressed in different 
5 tissues of the same organism as a result of, e.g., alternative splicing of RNA. Alternatively, 

isoforms can be encoded by different genes. In the invention, homologous nucleotide sequences . 
include nucleotide sequences encoding for a SECP polypeptide of species other than humans, 
including, but not limited to, mammals, and thus can include, e.g., mouse, rat, rabbit, dog, cat 
cow, horse, and other organisms. Homologous nucleotide sequences also include, but are not 

10 limited to, naturally occurring allelic variations and mutations of the nucleotide sequences set 
forth herein. A homologous nucleotide sequence does not, however, include the nucleotide 
sequence encoding human SECP protein. Homologous nucleic acid sequences include those 
nucleic acid sequences that encode conservative amino acid substitutions (see below) in any of 
SEQ ID NO:l, 3, 5, 7, 9, 11,13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, as well as a 

15 polypeptide having SECP activity. Biological activities of the SECP proteins are described 
below. A homologous amino acid sequence does not encode the amino acid sequence of a 
human SECP polypeptide. 

The nucleotide sequence determined from the cloning of the human SECP gene allows 
for the generation of probes and primers designed for use in identifying the cell types disclosed 

20 and/or cloning SECP homologues in other cell types, e.g., from other tissues, as well as SECP 

homologues from other mammals. The probe/primer typically comprises a substantially-purified 
oligonucleotide. The oligonucleotide typically comprises a region of nucleotide sequence that 
hybridizes under stringent conditions to at least about 12, 25, 50, 100, 150, 200, 250, 300, 350 or 
400 or more consecutive sense strand nucleotide sequence of SEQ ID NO: 1, 3, 5, 7, 9, 11, 13, 

25 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56; or an anti-sense strand nucleotide sequence of SEQ 
ID NO: 1, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, or of a naturally 
occurring mutant of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42,44,46, 48, 50, 52, 54 and 
56. 

Probes based upon the human SECP nucleotide sequence can be used to detect transcripts 
30 or genomic sequences encoding the same or homologous proteins. In various embodiments, the 
probe further comprises a label group attached thereto, e.g., the label group can be a 
radioisotope, a fluorescent compound, an enzyme, or an enzyme co-factor. Such probes can be 
used as a part of a diagnostic test kit for identifying cells or tissue which mis-express a SECP 
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protein, such as by measuring a level of a SECP-encoding nucleic acid in a sample of cells from 
a subject e.g.y detecting SECP mRNA levels or determining whether a genomic SEC? gene has 
been mutated or deleted. 

The term "a polypeptide having a biologically-active portion of SECP" refers to 
5 polypeptides exhibiting activity similar, but not necessarily identical to, an activity of a 

polypeptide of the invention, including mature forms, as measured in a particular biological 
assay, with or without dose dependency. A nucleic acid fragment encoding a "biologically- 
active portion of SECP" can be prepared by isolating a portion of SEQ ED NO:l, 3, 5, 7, 9, 11, 
13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 that encodes a polypeptide having a SECP 
10 biological aciiviiy, expressing the encoded portion of SECP protein (e.g., by recombinant 
expression in vitro), and assessing the activity of the encoded portion of SECP. 

SECP Variants 

The invention further encompasses, nucleic acid molecules that differ from the disclosed 
SECP nucleotide sequences due to degeneracy of the genetic code. These nucleic acids therefore 
15 encode the same SECP protein as those encoded by the nucleotide sequence shown in SEQ ID 
NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56. In another embodiment, an 
isolated nucleic acid molecule of the invention has a nucleotide sequence encoding a protein 
having an amino acid sequence shown in any of SEQ ID NO: 1, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 
44, 46, 48, 50, 52, 54 and 56. 

20 In addition to the human SECP nucleotide sequence shown in any of SEQ ID NO: 1, 3, 5, 

7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, it will be appreciated by those skilled in 
the art that DNA sequence polymorphisms that lead to changes in the amino acid sequences of 
SECP may exist within a population {e.g., the human population). Such genetic polymorphism 
in the SECP gene may exist among individuals within a population due to natural allelic 

25 variation. As used herein, the terms "gene" and "recombinant gene" refer to nucleic acid 

molecules comprising an open reading frame encoding a SECP protein, preferably a mammalian 
SECP protein. Such natural allelic variations can typically result in 1-5% variance in the 
nucleotide sequence of the SECP gene. Any and all such nucleotide variations and resulting 
amino acid polymorphisms in SECP that are the result of natural allelic variation and that do not 

30 alter the functional activity of SECP are intended to be within the scope of the invention. 

Additionally, nucleic acid molecules encoding SECP proteins from other species, and 
thus that have a nucleotide sequence that differs from the human sequence of any of SEQ ID 
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NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 are intended to be within the 
scope of the invention. Nucleic acid molecules corresponding to natural allelic variants and 
homologues of the SECP cDNAs of the invention can be isolated based on their homology to the 
human SECP nucleic acids disclosed herein using the human cDNAs, or a portion thereof, as a 
5 hybridization probe according to standard hybridization techniques under stringent hybridization 
conditions. 

In another embodiment, an isolated nucleic acid molecule of the invention is at least 6 
nucleotides in length and hybridizes under stringent conditions to the nucleic acid molecule 
comprising the nucleotide sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 

10 46, 48, 50, 52, 54 and/or 56. In another embodiment, the nucleic acid is at least 10, 25, 50, 100, 
250, 500 or 750 nucleotides in length. In yet another embodiment, an isolated nucleic acid ' 
molecule of the invention hybridizes to the coding region. As used herein, the term "hybridizes 
under stringent conditions" is intended to describe conditions for hybridization and washing 
under which nucleotide sequences at least 60% homologous to each other typically remain 

15 hybridized to each other. 

Homologs (i.e., nucleic acids encoding SECP proteins derived from species other than 
human) or other related sequences (e.g., paralogs) can be obtained by low, moderate or high 
stringency hybridization with all or a portion of the particular human sequence as a probe using 
methods well known in the art for nucleic acid hybridization and cloning. 

20 As used herein, the phrase "stringent hybridization conditions" refers to conditions under 

which a probe, primer or oligonucleotide will hybridize to its target sequence, but to no other 
sequences. Stringent conditions are sequence-dependent and will be different in different 
circumstances. Longer sequences hybridize specifically at higher temperatures than shorter 
sequences. Generally, stringent conditions are selected to be about 5^C lower than the thermal 

25 melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the 
temperature (under defined ionic strength, pH and nucleic acid concentration) at which 50% of 
the probes complementary to the target sequence hybridize to the target sequence at equilibrium. 
Since the target sequences are generally present at excess, at T^. 50% of the probes are occupied 
at equilibrium. Typically, stringent conditions will be those in which the salt concentration is 

30 less than about 1.0 M sodium ion, typically about 0.01 to 1.0 M sodium ion (or other salts) at pH 
7.0 to 8.3 and the temperature is at least about 30°C for short probes, primers or oligonucleotides 
(e.g., 10 nt to 50 nt) and at least about 60°C for longer probes, primers and oligonucleotides. 
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Stringent conditions may also be achieved with the addition of destabiHzing agents, such as 
formamide. 

Stringent conditions are known to those skilled in the art and can be found in Current 
Protocols in Molecular Biology, John Wiley & Sons, N,Y. (1989), 6.3.1-6.3.6. Preferably, 
5 the conditions are such that sequences at least about 65%, 70%, 75%, 85%, 90%, 95%, 98%, or 
99% homologous to each other typically remain hybridized to each other. A non-limiting 
example of stringent hybridization conditions is hybridization in a high salt buffer comprising 
6X SSC, 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.02% PVP, 0.02% Ficoll, 0.02% BSA, and 
500 mg/ml denatured salmon sperm DNA at 65°C. This hybridization is followed by one or 

iO more washes in 0.2X SSC, 0.01% BSA at 50°C. An isolated nucleic acid molecule of the 

invention that hybridizes under stringent conditions to the sequence of any of SEQ ID NO:l, 3, 
5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 corresponds to a naturally occurring 
nucleic acid molecule. As used herein, a "naturally-occurring" nucleic acid molecule refers to an 
RNA or DNA molecule having a nucleotide sequence that occurs in nature {e.g., encodes a 

15 natural protein). 

In a second embodiment, a nucleic acid sequence that is hybridizable to the nucleic acid 
molecule comprising the nucleotide sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 
40, 42, 44, 46, 48, 50, 52, 54 and/or 56, or fragments, analogs or derivatives thereof, under 
conditions of moderate stringency is provided. A non-limiting example of moderate stringency 

20 hybridization conditions are hybridization in 6X SSC, 5X Denhardt's solution, 0.5% SDS and 
100 mg/ml denatured salmon sperm DNA at 55*'C, followed by one or more washes in IX SSC, 
0.1% SDS at 37°C. Other conditions of moderate stringency that may be used are well known in 
the art. See, e,g„ Ausubel et al feds.), 1993, Current Protocols in Molecular Biology, 
John Wiley & Sons, NY, and Kriegler, 1990. Gene Transfer and Expression, A Laboratory 

25 Manual, Stockton Press, NY. 

In a third embodiment, a nucleic acid that is hybridizable to the nucleic acid molecule 

comprising the nucleotide sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 

46, 48, 50, 52, 54 and 56, or fragments, analogs or derivatives thereof, under conditions of low 

stringency, is provided. A non-limiting example of low stringency hybridization conditions are 

30 hybridization in 35% formamide, 5X SSC, 50 mM Tris-HCl (pH 7.5), 5 mM EDTA, 0.02% PVP, 

0.02% Ficoll, 0.2% BSA, 100 mg/ml denatured salmon sperm DNA, 10% (wt/vol) dextran 

sulfate at 40°C, followed by one or more washes in 2X SSC, 25 mM Tris-HCl (pH 7.4), 5 mM 

EDTA, and 0.1% SDS at 50°C. Other conditions of low stringency that may be used are well 
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known in the art (e.g., as employed for cross-species hybridizations). See, e.g., Ausubel, et aL, 
feds.), 1993. Current Protocols in Molecular Biology, John Wiley & Sons, NY, and 
Kriegler, 1990. Gene Transfer and Expression, A Laboratory Manual, Stockton Press, 
NY; Shilo and Weinberg, 1981. Proc. Natl. Acad. Sci. USA 78: 6789-6792. 

5 Conservative Mutations 

In addition to naturally-occurring allelic variants of the SECP sequence that may exist in 
the population, the skilled artisan will further appreciate that changes can be introduced by 
mutation into the nucleotide sequence of any of SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 
44, 46, 48, 50, 52, 54 and 56, thereby leading to changes in the amino acid sequence of the 

10 encoded SECP protein, without altering the functional ability of the SECP protein. For example, 
nucleotide substitutions leading to amino acid substitutions at "non-essential" amino acid 
residues can be made in the sequence of any of SEQ ID NO: 1, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 
44, 46, 48, 50, 52, 54 and 56. A "non-essential" amino acid residue is a residue that can be 
altered from the wild-type sequence of SECP without altering the biological activity, whereas an 

15 "essential" amino acid residue is required for biological activity. For example, amino acid 
residues that are conserved among the SECP proteins of the invention, are predicted to be 
particularly non-anienable to such alteration. 

Amino acid residues that are conserved among members of a SECP family members are 
predicted to be less amenable to alteration. For example, a SECP protein according to the 
20 invention can contain at least one domain that is a typically conserved region in a SECP family 
member. As such, these consjerved domains are not likely to be amenable to mutation. Other 
amino acid residues, however, (e.g., those that are not conserved or only semi-conserved among 
members of the SECP family) may not be as essential for activity and thus are more likely to be 
amenable to alteration. 

25 Another aspect of the invention pertains to nucleic acid molecules encoding SECP 

proteins that contain changes in amino acid residues that are not essential for activity. Such 
SECP proteins differ in amino acid sequence from any of any of SEQ ID NO:2, 4, 6, 8, 10, 12, 
14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57, yet retain biological activity. In one 
embodiment, the isolated nucleic acid molecule comprises a nucleotide sequence encoding a 

30 protein, wherein the protein comprises an amino acid sequence at least about 75% homologous 
to the amino acid sequence of any of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 
51, 53, 55 and 57. Preferably, the protein encoded by the nucleic acid is at least about 80% 
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homologous to any of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 
57, more preferably at least about 90%, 95%, 98%, and most preferably at least about 99% 
homologous to SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57,. 

An isolated nucleic acid molecule encoding a SECP protein homologous to the protein of 
5 any of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57, can be 
created by introducing one or more nucleotide substitutions, additions or deletions into the 
corresponding nucleotide sequence (i.e., SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 42, 44, 46, 
48, 50, 52, 54 and/or 56), such that one or more amino acid substitutions, additions or deletions 
are introduced into the encoded protein. 

10 Mutations can be introduced into SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 

48, 50, 52, 54 and/or 56 by standard techniques, such as site-directed mutagenesis and 
PCR-mediated mutagenesis. Preferably, conservative amino acid substitutions are made at one 
or more predicted non-essential amino acid residues. A "conservative amino acid substitution" is" 
one in which the amino acid residue is replaced with an amino acid residue having a similar side 

15 chain. Families of amino acid residues having similar side chains have been defined in the art. 

These famiUes include amino acids with basic side chains (e.g., lysine, arginine, histidine), acidic 
side chains (e.g., aspartic acid, glutamic acid), uncharged polar side chains (e.g., glycine, 
asparagine, glutamine, serine, threonine, tyrosine, cysteine), nonpolar side chains (e.g., alanine, 
valine, leucine, isoleucine, proline, phenylalanine, methionine, tryptophan), P-branched side 

20 chains (e.g., threonine, valine, isoleucine) and aromatic side chains (e.g., tyrosine, phenylalanine, 
tryptophan, histidine). Thus, a predicted nonessential amino acid residue in SECP is replaced 
with another amino acid residue from the same side chain family. Alternatively, in another 
embodiment, mutations can be introduced randomly along all or part of a SECP coding 
sequence, such as by saturation mutagenesis, and the resultant mutants can be screened for SECP 

25 biological activity to identify mutants that retain activity. Following mutagenesis of SEQ ID 
NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56, the encoded protein can 
be expressed by any recombinant technology known in the art and the activity of the protein can 
be determined. 

In one embodiment, a mutant SECP protein can be assayed for: (i) the ability to form 
30 protein :protein interactions with other SECP proteins, other cell-surface proteins, or biologically- 
active portions thereof; (ii) complex formation between a mutant SECP protein and a SECP 
receptor; (Hi) the ability of a mutant SECP protein to bind to an intracellular target protein or 
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biologically active portion thereof; ie,g., avidin proteins); (iv) the ability to bind BRA protein; or 
(v) the ability to specifically bind an anti-SECP protein antibody. 

Antisense Nucleic Acids 

Another aspect of the invention pertains to isolated antisense nucleic acid molecules that 
5 are hybridizable to or complementary to the nucleic acid molecule comprising the nucleotide 
sequence of SEQ ID NO: 1, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56 or 
fragments, analogs or derivatives thereof. An "antisense" nucleic acid comprises a nucleotide 
sequence that is complementary to a "sense" nucleic acid encoding a protein, e.g., 
complementary to the coding strand of a double-stranded cDNA molecule or complementary to 
10 an mRNA sequence. In specific aspects, antisense nucleic acid molecules are provided that 

comprise a sequence complementary to at least about 10, 25, 50, 100, 250 or 500 nucleotides or 
an entire SECP coding strand, or to only a portion thereof. Nucleic acid molecules encoding 
fragments, homologs, derivatives and analogs of a SECP protein of any of SEQ ID NO:2, 4, 6, 8, 
10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55, and 57. 

15 or antisense nucleic acids complementary to a SECP nucleic acid sequence of SEQ ID 

NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49; 51, 53, 55 and 57, are additionally provided. 

In one embodiment, an antisense nucleic acid molecule is antisense to a "coding region" 
of the coding strand of a nucleotide sequence encoding SECP. The term "coding region" refers 
to the region of the nucleotide sequence comprising codons which are translated into amino acid 
20 residues {e.g., the protein coding region of a human SECP that corresponds to any of SEQ ID 
NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and 57. 

. In another embodiment, the antisense nucleic acid molecule is antisense to a "non- 
coding region" of the coding strand of a nucleotide sequence encoding SECP. The term "non- 
coding region" refers to 5'- and 3'-terminal sequences which flank the coding region that are not 
25 translated into amino acids {i.e., also referred to as 5' and 3' non-translated regions). 

Given the coding strand sequences encoding the SECP proteins disclosed herein {e.g.y 

SEQIDNO:l,3, 5,7,9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56), antisense 

nucleic acids of the invention can be designed according to the rules of Watson and Crick or 

Hoogsteen base-pairing. The antisense nucleic acid molecule can be complementary to the entire 

30 coding region of SECP mRNA, but more preferably is an oligonucleotide that is antisense to 

only a portion of the coding or non-coding region of SECP mRNA. For example, the antisense 

oligonucleotide can be complementary to the region surrounding the translation start site of 
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SECP mRNA. An antisense oligonucleotide can be, for example, about 5, 10, 15, 20, 25, 30, 35, 
40, 45 or 50 nucleotides in length. An antisense nucleic acid of the invention can be constructed 
using chemical synthesis or enzymatic ligation reactions using procedures known in the art. For 
example, an antisense nucleic acid (e.g., an antisense oligonucleotide) can be chemically 
5 synthesized using naturally-occurring nucleotides or variously modified nucleotides designed to 
increase the biological stability of the molecules or to increase the physical stability of the 
duplex formed between the antisense and sense nucleic acids, e.g., phosphorothioate derivatives 
and acridine-substituted nucleotides can be used. 

Examples of modified nucleotides that can be used to generate the antisense nucleic acid 
10 include: 5-fluorouracil, 5-bromouracil, 5-chlorouracii, 5-iodouracii, hypoxanthine, xanthine, 
4-acetylcytosine, 5-(carboxyhydroxylmethyl) uracil, 5-carboxymethylaminomethyl- 
2-'thiouridine, 5-carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, 
inosine, N6-isopentenyladenine, 1-methylguanine, l-methylinosine, 2,2-dimethylguanine, 
2-methyIadenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenme, 
15 7-methylguanine, 5-methylaminomethyiuracil, 5-methoxyaminomethyl-2-thiouracil, 
beta-D-mannosylqueosine, 5'-methoxycarboxymethyluracil, 5-methoxyuracil, 
2-methylthio-N6-isopentenyladenine, uraciI-5-oxyacetic acid (v), wybutoxosine, pseudouracil, 
queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, 
uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid (v), 5-methyl-2-thiouracil, 
20 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3)w, and 2,6-diaminopurine. Alternatively, the 
antisense nucleic acid can be produced biologically using an expression vector into which a 
nucleic acid has been subcloned in an antisense orientation (i.e., RNA transcribed from the 
inserted nucleic acid will be of an antisense orientation to a target nucleic acid of interest, 
described further in the following subsection). 

25 The antisense nucleic acid molecules of the invention are typically administered to a 

subject or generated in situ such that they hybridize with or bind to cellular mRNA and/or 
genomic DNA encoding a SECP protein to thereby inhibit expression of the protein, e.g., by 
inhibiting transcription and/or translation. The hybridization can be by conventional nucleotide 
complementarity to form a stable duplex, or, for example, in the case of an antisense nucleic acid 

30 molecule that binds to DNA duplexes, through specific interactions in the major groove of the 
double helix. An example of a route of administration of antisense nucleic acid molecules of the 
invention includes direct injection at a tissue site. Alternatively, antisense nucleic acid 
molecules can be modified to target selected cells and then administered systemically. For 

156 



example, for systemic administration, antisense molecules can be modified such that they 
specifically bind to receptors or antigens expressed on a selected cell surface (e.g., by linking the 
antisense nucleic acid molecules to peptides or antibodies that bind to cell surface receptors or 
antigens). The antisense nucleic acid molecules can also be delivered to cells using the vectors 
5 described herein. To achieve sufficient intracellular concentrations of antisense molecules, 
vector constructs in which the antisense nucleic acid molecule is placed under the control of a 
strong pol n or pol HI promoter are preferred. 

In yet another embodiment, the antisense nucleic acid molecule of the invention is an 
a-anomeric nucleic acid molecule. An a-anomeric nucleic acid molecule forms specific 
10 double-stranded hybrids with complementary RNA m which, contrary to the usual a-units, the 
strands run parallel to each other (see, Gaultier, et aLy 1987. NucL Acids Res. 15: 6625t6641). 
The antisense nucleic acid molecule can also comprise a 2'-o-methylribonucleotide (Inoue, et aL, 
1987. NucL Acids Res. 15: 6131-6148) or a chimeric RNA-DNA analogue (Inoue, et aL, 1987. 
FEES Lett, 215: 327-330). 

1 5 Ribozymes and PNA Moieties 

Such modifications include, by way of non-limiting example, modified bases, and nucleic 
acids whose sugar phosphate backbones are modified or derivatized. These modifications are 
carried out at least in part to enhance the chemical stability of the modified nucleic acid, such 
that they may be used, for example, as antisense binding nucleic acids in therapeutic applications 

20 in a subject. 

In still another embodiment, an antisense nucleic acid of the invention is a ribozyme. 
Ribozymes are catalytic RNA molecules with ribonuclease activity that are capable of cleaving a 
single-stranded nucleic acid, such as an mRNA, to which they have a complementary region. 
Thus, ribozymes (e.g., hanmierhead ribozymes; described by Haselhoff and Gerlach, 1988. 

25 Nature 334: 585-591) can be used to catalytically-cleave SECP mRNA transcripts to thereby 

inhibit translation of SECP mRNA. A ribozyme having specificity for a SECP-encoding nucleic 
acid can be designed based upon the nucleotide sequence of a SECP DNA disclosed herein (i.e., 
SEQIDNO:l,3, 5,7,9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56). Forexample,a 
derivative of a Tetrahymena L-19 IVS RNA can be constructed in which the nucleotide sequence 

30 of the active site is complementary to the nucleotide sequence to be cleaved in a SECP-encoding 
mRNA. See, e.g., Cech, et aL, U.S. Patent No. 4,987,071; arid Cech, et aL, U.S. Patent No. 
5,1 16,742. Alternatively, SECP mRNA can be used to select a catalytic RNA having a specific 
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ribonuclease activity from a pool of RNA molecules (Bartel, et aL^ 1993. Science 261: 
1411-1418). 

Alternatively, SECP gene expression can be inhibited by targeting nucleotide sequences 
complementary to the regulatory region of the SECP (e.g., the SECP promoter and/or enhancers) 
5 to form triple helical structures that prevent transcription of the SECP gene in target cells. See, 
e.g., Helene, 1991. Anticancer Drug Des. 6: 569-84; Helene, et al., 1992. Ann. KY, Acad, Set, 
660: 27-36; and Maher, 1992. Bioassays 14: 807-15. 

In various embodiments, the nucleic acids of SECP can be modified at the base moiety, 
sugar moiety or phosphate backbone to improve, e.g., the stability, hybridization, or solubility of 

10 the molecule. For example, the deoxyribose phosphate backbone of the nucleic acids can be 

modified to generate peptide nucleic acids (Hyrup, et al., 1996. Bioorg. Med. Chem. 4: 5-23). As 
used herein, the terms "peptide nucleic acids" or "PNAs" refer to nucleic acid mimics, e.g., DNA 
mimics, in which the deoxyribose phosphate backbone is replaced by a pseudopeptide backbone 
and only the four natural nucleobases are retained. The neutral backbone of PNAs has been 

15 shown to allow for specific hybridization to DNA and RNA under conditions of low ionic 

strength. The synthesis of PNA oligomers can be performed using standard solid phase peptide 
synthesis protocols as described in Hyrup, et aL, 1996. supra; Perry-O'Keefe, et at., 1996. Proc. 
Natl. Acad, Sci, USA 93: 14670-14675. 

PNAs of SECP can be used in therapeutic and diagnostic applications. For example, 
20 PNAs can be used as antisense or antigene agents for sequence-specific modulation of gene 

expression by, e,g., inducing transcription or translation arrest or inhibiting replication. PNAs of 
SECP can also be used, e.g., in the analysis of single base pair mutations in a gene by, e.g., PNA 
directed PCR clamping; as artificial restriction enzymes when used in combination with other 
enzymes, e.g,, SI nucleases (see, Hyrup, 1996., supra); or as probes or primers for DNA 
25 sequence and hybridization (see, Hyrup, etaL^ 1996.; Perry-0*Keefe, 1996., supra). 

In another embodiment, PNAs of SECP can be modified, e.g., to enhance their stability 
or cellular uptake, by attaching lipophilic or other helper groups to PNA, by the formation of 
PNA-DNA chimeras, or by the use of liposomes or other techniques of drug delivery known in 
the art. For example, PNA-DNA chimeras of SECP can be generated that may combine the 
30 advantageous properties of PNA and DNA. Such chimeras allow DNA recognition enzymes, 
e.g., RNase H and DNA polymerases, to interact with the DNA portion while the PNA portion 
would provide high binding affinity and specificity. PNA-DNA chimeras can be linked using 
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linkers of appropriate lengths selected in terms of base stacking, number of bonds between the 
nucleobases, and orientation (see, Hyrup, 1996., supra). The synthesis of PNA-DNA chimeras 
can be performed as described in Finn, et al., (1996. NucL Acids Res, 24: 3357-3363). For 
example, a DNA chain can be synthesized on a solid support using standard phosphoramidite 
5 coupling chemistry, and modified nucleoside analogs, e.g., 5 -(4-methoxytrityl)amino- 

5*-deoxy-thymidine phosphoramidite, can be used between the PNA and the 5' end of DNA 
(Mag, et aL, 1989. NucL Acid Res. 17: 5973-5988). PNA monomers are then coupled in a 
stepwise manner to produce a chimeric molecule with a 5' PNA segment and a 3* DNA segment 
(see, Finn, et al., 1996., supra). Alternatively, chimeric molecules can be synthesized with a 5' 
10 DNA segment and a 3' PNA segment. See. e.g.. Petersen, et al, 1975. Bioorg. Med. Chem. Lett 
5: 1119-11124. 

In other embodiments, the oligonucleotide may include other appended groups such as 
peptides (e.g., for targeting host cell receptors in vivo), or agents facilitating transport across the 
cell membrane (see, e.g., Letsinger, et al., 1989. Proc. Natl. Acad. Sci. U.S.A. 86: 6553-6556; 

15 Lemaitre, et al., 1987. Proc. Natl. Acad. Sci. 84: 648-652; PCT Publication No. WO88/09810) or 
the blood-brain barrier (see, e.g., PCT Publication No. WO 89/10134). In addition, 
oligonucleotides can be modified with hybridization triggered cleavage agents (see, e.g., Krol, et 
al., 1988. BioTechniques 6:958-976) or intercalating agents (see, e.g., Zon, 1988. Pharm. Res. 5: 
539-549). To this end, the oligonucleotide may be conjugated to another molecule, e.g., a 

20 peptide, a hybridization triggered cross-linking agent, a transport agent, a hybridization-triggered 
cleavage agent, and the like. 

Characterization of SECP Polypeptides 

A polypeptide according to the invention includes a polypeptide including the amino acid 
sequence of SECP polypeptides whose sequences are provided in SEQ ID NO:2, 4, 6, 8, 10, 12, 
25 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55, and/or 57. The invention also includes a mutant or 

variant protein any of whose residues may be changed from the corresponding residues shown in 
SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55, and/or 57 while still 
encoding a protein that maintains its SECP activities and physiological functions, or a functional 
fragment thereof. 

30 In general, a SECP variant that preserves SECP-like function includes any variant in 

which residues at a particular position in the sequence have been substituted by other amino 
acids, and further include the possibility of inserting an additional residue or residues between 



two residues of the parent protein as well as the possibility of deleting one or more residues from 
the parent sequence. Any amino acid substitution, insertion, or deletion is encompassed by the 
invention. In favorable circumstances, the substitution is a conservative substitution as defined 
above. 

5 One aspect of the invention pertains to isolated SECP proteins, and biologically-active 

portions thereof, or derivatives, fragments, analogs or homologs thereof. Also provided are 
polypeptide fragments suitable for use as immunogens to raise anti-SECP antibodies. In one 
embodiment, native SECP proteins can be isolated from cells or tissue sources by an appropriate 
purification scheme using standard protein purification techniques. In another embodiment, 
iO SECP proteins are produced by recombinant DNA techniques. Alternative to recombinant 

expression, a SECP protein or polypeptide can be synthesized chemically using standard peptide 
synthesis techniques. 

An "isolated" or "purified" polypeptide or protein or biologically-active portion thereof is 
substantially free of cellular material or other contaminating proteins from the cell or tissue 

15 source from which the SECP protein is derived, or substantially free from chemical precursors or 
other chemicals when chemically synthesized. The language "substantially free of cellular 
material" includes preparations of SECP proteins in which the protein is separated from cellular 
components of the cells from which it is isolated or recombinantly-produced. In one 
embodiment, the language "substantially free of cellular material" includes preparations of SECP 

20 proteins having less than about 30% (by dry weight) of non-SECP proteins (also referred to 

herein as a "contaminating protein"), more preferably less than about 20% of non-SECP proteins, 
still more preferably less than about 10% of non-SECP proteins, and most preferably less than 
about 5% of non-SECP proteins. When the SECP protein or biologically-active portion thereof 
is recombinantly-produced, it is also preferably substantially free of culture medium, i.e., culture 

25 medium represents less than about 20%, more preferably less than about 10%, and most 
preferably less than about 5% of the volume of the SECP protein preparation. 

The phrase "substantially free of chemical precursors or other chemicals" includes 
preparations of SECP protein in which the protein is separated from chemical precursors or other 
chemicals that are involved in the synthesis of the protein. In one embodiment, the language 
30 "substantially free of chemical precursors or other chemicals" includes preparations of SECP 
protein having less than about 30% (by dry weight) of chemical precursors or non-SECP 
chemicals, more preferably less than about 20% chemical precursors or non-SECP chemicals. 
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still more preferably less than about 10% chemical precursors or non-SECP chemicals, and most 
preferably less than about 5% chemical precursors or non-SECP chemicals. 

Biologically-active portions of a SEC? protein include peptides comprising amino acid 
sequences sufficiently homologous to or derived from the amino acid sequence of the SECP 
5 protein which include fewer amino acids than the full-length SECP proteins, and exhibit at least 
one activity of a SECP protein. Typically, biologically-active portions comprise a domain or 
motif with at least one activity of the SECP protein. A biologically-active portion of a SECP 
protein can be a polypeptide which is, for example, 10, 25, 50, 100 or more amino acids in 
length. 

10 A biologically-active portion of a SECP protein of the invention may contain at least one 

of the above-identified conserved domains. Moreover, other biologically active portions, in 
which other regions of the protein are deleted, can be prepared by recombinant techniques and 
evaluated for one or more of the functional activities of a native SECP protein. 

In an embodiment, the SECP protein has an amino acid sequence shown in any of SEQ 
15 ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56. In other 

embodiments, the SECP protein is substantially homologous to any of SEQ ID NO: 1, 3, 5, 7, 9, 
1 1, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56 and retains the functional activity of the 
proteinofanyofSEQIDNO:l, 3,5,7,9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56 
yet differs in amino acid sequence due to natural allelic variation or mutagenesis, as described in 
20 detail below. Accordingly, in another embodiment, the SECP protein is a protein that comprises 
an amino acid sequence at least about 45% homologous, and more preferably about 55, 65, 70, 
75, 80, 85, 90, 95, 98 or even 99% homologous to the amino acid sequence of any of SEQ ID 
NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56 and retains the functional 
activity of the SECP proteins of the corresponding polypeptide having the sequence of SEQ ID 
25 NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and/or 56. 

Determining Homology Between Two or More Sequences 

To determine the percent homology of two amino acid sequences or of two nucleic acids, 
the sequences are aligned for optimal comparison purposes {e.g., gaps can be introduced in the 
sequence of a first amino acid or nucleic acid sequence for optimal alignment with a second 
.30 amino or nucleic acid sequence). The amino acid residues or nucleotides at corresponding amino 
acid positions or nucleotide positions are then compared. When a position in the first sequence 
is occupied by the same amino acid residue or nucleotide as the corresponding position in the 
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second sequence, then the molecules are homologous at that position (Le,y as used herein amino 
acid or nucleic acid "homology" is equivalent to amino acid or nucleic acid "identity"). 

The nucleic acid sequence homology may be determined as the degree of identity 
between two sequences. The homology may be determined using computer programs known in 
5 the art, such as GAP software provided in the GCG program package. See, Needleman and 

Wunsch, 1970. J. Mol. Biol 48: 443-453. Using GCG GAP software with the following settings 
for nucleic acid sequence comparison: GAP creation penalty of 5.0 and GAP extension penalty 
of 0.3, the coding region of the analogous nucleic acid sequences referred to above exhibits a 
degree of identity preferably of at least 70%, 75%, 80%, 85%, 90%, 95%, 98%, or 99%, with the 
iO CDS (encoding) part of the DNA sequence shown in SEQ ID NO:l, 3, 5, 7, 9, 1 1, 13, 15, 17, 40, 
42, 44, 46, 48, 50, 52, 54 and/or 56. 

The term "sequence identity" refers to the degree to which two polynucleotide or 
polypeptide sequences are identical on a residue-by-residue basis over a particular region of 
comparison. The term "percentage of sequence identity" is calculated by comparing two 

15 optimally aligned sequences over that region of comparison, determining the number of positions 
at which the identical nucleic acid base (e.g.. A, T, C, G, U, or I, in the case of nucleic acids) 
occurs in both sequences to yield the number of matched positions, dividing the number of 
matched positions by the total number of positions in the region of comparison {i.e,, the window 
size), and multiplying the result by 100 to yield the percentage of sequence identity. The term 

20 "substantial identity" as used herein denotes a characteristic of a polynucleotide sequence, 

wherein the polynucleotide comprises a sequence that has at least 80 percent sequence identity, 
preferably at least 85 percent identity and often 90 to 95 percent sequence identity, more usually 
at least 99 percent sequence identity as compared to a reference sequence over a comparison 
region. 

25 Chimeric and Fusion Proteins 

The invention also provides SECP chimeric or fusion proteins. As used herein, a SECP 
"chimeric protein" or "fusion protein" comprises a SECP polypeptide operatively-linked to a 
non-SECP polypeptide. An "SECP polypeptide" refers to a polypeptide having an amino acid 
sequence corresponding to a SECP protein shown in SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 

30 41, 43, 45, 47, 49, 51, 53, 55, and/or 57, whereas a "non-SECP polypeptide" refers to a 

polypeptide having an amino acid sequence corresponding to a protein that is not substantially 
homologous to the SECP protein (e.g., a protein that is different from the SECP protein and that 



is derived from the same or a different organism). Within a SECP fusion protein the SECP 
polypeptide can correspond to all or a portion of a SECP protein. In one embodiment, a SECP 
fusion protein comprises at least one biologically-active portion of a SECP protein. In another 
embodiment, a SECP fusion protein comprises at least two biologically-active portions of a 
5 SECP protein. In yet another embodiment, a SECP fusion protein comprises at least three 

biologically-active portions of a SECP protein. Within the fusion protein, the term "operatively- 
linked" is intended to indicate that the SECP polypeptide and the non-SECP polypeptide are 
fused in-frame with one another. The non-SECP polypeptide can be fused to the amino-terminus 
or carboxyl-terminus of the SECP polypeptide. 

10 In one embodiment, the fusion protein is a GST-SECP fusion protein in which the SECP 

sequences are fused to the carboxyl-terminus of the GST (glutathione S-transferase) sequences. 
Such fusion proteins can facilitate the purification of recombinant SECP polypeptides. 

In another embodiment, the fusion protein is a SECP protein containing a heterologous 
signal sequence at its amino-terminus. In certain host cells (e.g., mammalian host cells), 
15 expression and/or secretion of SECP can be increased through use of a heterologous signal 
sequence. 

In yet another embodiment, the fusion protein is a SECP-immunoglobulin fusion protein ' 
in which the SECP sequences are fused to sequences derived from a member of the 
immunoglobulin protein family. The SECP-inraiunoglobulin fusion proteins of the invention can 

20 be incorporated into pharmaceutical compositions and administered to a subject to inhibit an 
interaction between a SECP ligand and a SECP protein on the surface of a cell, to thereby 
suppress SECP-mediated signal transduction in vivo. The SECP-immunoglobulin fusion 
proteins can be used to affect the bioavailability of a SECP cognate ligand. Inhibition of the 
SECP ligand/SECP interaction may be useful therapeutically for both the treatment of 

25 proliferative and differentiative disorders, as well as modulating (e.g., promoting or inhibiting) 
cell survival. Moreover, the SECP-immunoglobulin fusion proteins of the invention can be used 
as immunogens to produce anti-SECP antibodies in a subject, to purify SECP ligands, and in 
screening assays to identify molecules that inhibit the interaction of SECP with a SECP ligand. 

A SECP chimeric or fusion protein of the invention can be produced by standard 
30 recombinant DNA techniques. For example, DNA fragments coding for the different 

polypeptide sequences are ligated together in-frame in accordance with conventional techniques, 
e.g., by employing blunt-ended or stagger-ended termini for ligation, restriction enzyme 
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digestion to provide for appropriate termini, filling-in of cohesive ends as appropriate, alkaline 
phosphatase treatment to avoid undesirable joining, and enzymatic ligation. In another 
embodiment, the fusion gene can be synthesized by conventional techniques including automated 
DNA synthesizers. Alternatively, PCR amplification of gene fragments can be carried out using 
5 anchor primers that give rise to complementary overhangs between two consecutive gene 
fragments that can subsequently be annealed and reamplified to generate a chimeric gene 
sequence (see, e.g., Ausubel, et al (eds.) Current Protocols in Molecular Biology, John 
Wiley & Sons, 1992). Moreover, many expression vectors are commercially available that 
already encode a fusion moiety (e.g., a GST polypeptide). A SECP-encoding nucleic acid can be 
10 cloned into such an expression vector such that the fusion moiety is linked in-frame to the SECP 
protein. 

SECP Agonists and Antagonists 

The invention also pertains to variants of the SECP proteins that function as either SECP 
' agonists {i.e., mimetics) or as SECP antagonists. Variants of the SECP protein can be generated 
15 by mutagenesis (e.g., discrete point mutation or truncation of the SECP protein). An agonist of a 
SECP protein can retain substantially the same, or a subset of, the biological activities of the 
. naturally-occurring form of a SECP protein. An antagonist of a SECP protein can inhibit one or 
more of the activities of the naturally occurring form of a SECP protein by, for example, 
competitively binding to a downstream or upstream member of a cellular signaling cascade 
20 which includes the SECP protein. Thus, specific biological effects can be elicited by treatment., 
with a variant of limited function. In one embodiment, treatment of a subject with a variant 
having a subset of the biological activities of the naturally occurring form of the protein has 
fewer side effects in a subject relative to treatment with the naturally occurring form of the SECP 
proteins. 

25 Variants of the SECP proteins that function as either SECP agonists {i.e., mimetics) or as 

SECP antagonists can be identified by screening combinatorial libraries of mutants {e.g,^ 
truncation mutants) of the SECP proteins for SECP protein agonist or antagonist activity. In one 
embodiment, a variegated library of SECP variants is generated by combinatorial mutagenesis at 
the nucleic acid level and is encoded by a variegated gene library. A variegated library of SECP 

30 variants can be produced by, for example, enzymatically-ligating a mixture of synthetic 

oligonucleotides into gene sequences such that a degenerate set of potential SECP sequences is 

expressible as individual polypeptides, or alternatively, as a set of larger fusion proteins {e.g., for 

phage display) containing the set of SECP sequences therein. There are a variety of methods 
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which can be used to produce Hbraries of potential SECP variants from a degenerate 
oligonucleotide sequence. Chemical synthesis of a degenerate gene sequence can be performed 
in an automatic DNA synthesizer, and the synthetic gene then ligated into an appropriate 
expression vector. Use of a degenerate set of genes allows for the provision, in one mixture, of 
5 all of the sequences encoding the desired set of potential SECP sequences. Methods for 

synthesizing degenerate oligonucleotides are well-known within the art. See, e.g., Narang, 1983. 
Tetrahedron 39: 3; Itakura, et al., 1984. Annu. Rev. Biochem. 53: 323; Itakura, et al, 1984. 
Science 198: 1056; Ike, et aL, 1983. Nucl Acids Res. 11: 477. 

Polypeptide Libraries 

10 In addition, libraries of fragments of the SECP protein coding sequences can be used to 

generate a variegated population of SECP fragments for screening and subsequent selection of 
variants of a SECP protein. In one embodiment, a library of coding sequence fragments can be 
: generated by treating a double-stranded PCR fragment of a SECP coding sequence with a 
nuclease under conditions wherein nicking occurs only about once per molecule, denaturing the 

15 double stranded DNA, renaturing the DNA to form double-stranded DNA that can include 
sense/antisense pairs from different nicked products, removing single stranded portions from 
reformed duplexes by treatment with Si nuclease, and ligating the resulting fragment library into 
an expression vector. By this method, expression libraries can be derived which encodes 
amino-terminal and internal fragments of various sizes of the SECP proteins. 

20 Various techniques are known in the art for screening gene products of combinatorial 

libraries made by point mutations or truncation, and for screening cDNA libraries for gene 
products having a selected property. Such techniques are adaptable for rapid screening of the 
gene libraries generated by the combinatorial mutagenesis of SECP proteins. The most widely 
used techniques, which are amenable to high throughput analysis, for screening large gene 

25 libraries typically include cloning the gene library into replicable expression vectors, 
transforming appropriate cells with the resulting library of vectors, and expressing the 
combinatorial genes under conditions in which detection of a desired activity facilitates isolation 
of the vector encoding the gene whose product was detected. Recursive ensemble mutagenesis 
(REM), a new technique that enhances the frequency of functional mutants in the libraries, can 

30 be used in combination with the screening assays to identify SECP variants. See, e.g., Arkin and 
Yourvan, 1992. Proc. Natl. Acad. Sci. USA 89: 7811-7815; Delgrave, etal, 1993. Protein 
Engineering 6:327-33 1 . 
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Anti-SECP Antibodies 

The invention encompasses antibodies and antibody fragments, such as Fab or (Fab)2, that 
bind immunospecifically to any of the SECP polypeptides of said invention. 

An isolated SECP protein, or a portion or fragment thereof, can be used as an immunogen 
5 to generate antibodies that bind to SECP polypeptides using standard techniques for polyclonal 
and monoclonal antibody preparation. The full-length SECP proteins can be used or, 
alternatively, the invention provides antigenic peptide fragments of SECP proteins, for use as 
immunogens. The antigenic SECP peptides comprises at least 4 amino acid residues of the 
amino acid sequence shown in SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 
10 53, 55, and/or 57, and encompasses an epitope of SECP such that an antibody raised against the 
peptide forms a specific immune complex with SECP. Preferably, the antigenic peptide 
comprises at least 6, 8, 10, 15, 20, or 30 amino acid residues. Longer antigenic peptides are 
sometimes preferable over shorter antigenic peptides, depending on use and according to 
methods well known to someone skilled in the art. 

15 In certain embodiments of the invention, at least one epitope encompassed by the 

antigenic peptide is a region of SECP that is located on the surface of the protein (e.g., a 
hydrophilic region). As a means for targeting antibody production, hydropathy plots showing 
regions of hydrophilicity and hydrophobicity may be generated by any method well known in the 
art, including, for example, the Kyte-Doolittle or the Hopp-Woods methods, either with or 

20 without Fourier transformation (see, e.g., Hopp and Woods, 1981. Proc. Nat. Acad. Sci. USA 78: 
3824-3828; Kyte and Doolittle, 1982. J. Mol Biol. 157: 105-142, each incorporated herein by 
reference in their entirety). 

As disclosed herein, SECP protein sequences of SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 
41, 43, 45, 47, 49, 51, 53, 55, and/or 57, or derivatives, fragments, analogs, or homologs thereof, 

25 may be utilized as immunogens in the generation of antibodies that immunospecifically-bind 
these protein components. The term "antibody" as used herein refers to inununoglobulin 
molecules and immunologically-active portions of immunoglobulin molecules, i.e., molecules 
that contain an antigen binding site that specifically-binds (immunoreacts with) an antigen, such 
as SECP. Such antibodies include, but are not limited to, polyclonal, monoclonal, chimeric, 

30 single chain. Fab and F(ab')2 fragments, and an Fab expression library. In a specific embodiment, 
antibodies to human SECP proteins are disclosed. Various procedures known within the art may 
be used for the production of polyclonal or monoclonal antibodies to a SECP protein sequence of 
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SEQ ID NO: 2, 4, 6, 8, 10, 12, 14, 16, 18, 41, 43, 45, 47, 49, 51, 53, 55 and/or 57, or a derivative, 
fragment, analog, or homolog thereof. 

For the production of polyclonal antibodies, various suitable host animals (e.g., rabbit, 
goat, mouse or other mammal) may be immunized by injection with the native protein, or a 
5 synthetic variant thereof, or a derivative of the foregoing. An appropriate immunogenic 

preparation can contain, for example, recombinantly-expressed SECP protein or a chemically- 
synthesized SECP polypeptide. The preparation can further include an adjuvant. Various 
adjuvants used to increase the immunological response include, but are not limited to, Freund's 
(complete and incomplete), mineral gels (e.g., aluminum hydroxide), surface active substances 
10 (e.g., iysoieciihin, pluronic polyols, polyanions, peptides, oil emulsions, dinitrophenoi, etc.), 
human adjuvants such as Bacille Calmette-Guerin and Cory nebacte Hum parvum, or similar 
immunostimulatory agents. If desired, the antibody molecules directed against SECP can be 
isolated from the mammal (e.g., from the blood) and further purified by well known techniques, 
such as protein A chromatography to obtain the IgG fraction. 

15 The term "monoclonal antibody" or "monoclonal antibody composition", as used herein, 

refers to a population of antibody molecules that contain only one species of an antigen binding 
site capable of immunoreacting with a particular epitope of SECP. A monoclonal antibody 
composition thus typically displays a single binding affinity for a particular SECP protein with 
which it immunoreacts. For preparation of monoclonal antibodies directed towards a particular 

20 SECP protein, or derivatives, fragments, analogs or homologs thereof, any technique that 
provides for the production of antibody molecules by continuous cell line culture may be 
utilized. Such techniques include, but are not limited to, the hybridoma technique (see, e.g., 
Kohler & Milstein, 1975. Nature 256: 495-497); the trioma technique; the human B-cell 
hybridoma technique (see, e.g., Kozbor, et al., 1983. Immunol. Today 4: 72) and the EBV 

25 hybridoma technique to produce human monoclonal antibodies (see, e.g.. Cole, et ah, 1985. In: 
Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). Human 
monoclonal antibodies may be utilized in the practice of the invention and may be produced by 
using human hybridomas (see, e.g., Cote, et al, 1983. Proc Natl Acad Sci USA 80: 2026-2030) 
or by transforming human B-cells with Epstein Barr Virus in vitro (see, e.g., Cole, et al., 1985. 

30 In: Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). Each of 
the above citations is incorporated herein by reference in their entirety. 

According to the invention, techniques can be adapted for the production of single-chain 

antibodies specific to a SECP protein (see, e.g., U.S. Patent No. 4,946,778). In addition, 
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methods can be adapted for the construction of Fab expression libraries (see, e,g„ Huse, et aL, 
1989. Science 246: 1275-1281) to allow rapid and effective identification of monoclonal Fab 
fragments with the desired specificity for a SECP protein or derivatives, fragments, analogs or 
homologs thereof. Non-human antibodies can be "humanized" by techniques well known in the 
5 art. See, e.g., U.S. Patent No. 5,225,539. Antibody fragments that contain the idiotypes to a 
SECP protein may be produced by techniques known in the art including, but not limited to: 
(/) an F(ab')2 fragment produced by pepsin digestion of an antibody molecule; (ii) an Fab fragment 
generated by reducing the disulfide bridges of an F(ab')2 fragment; (fii) an Fab fragment generated 
by the treatment of the antibody molecule with papain and a reducing agent and (/v) Fy 
10 fragments. 

Additionally, recombinant anti-SECP antibodies, such as chimeric and humanized 
monoclonal antibodies, comprising both human and non-human portions, which can be made 
using standard recombinant DNA techniques, are within the scope of the invention. Such 
chimeric and humanized monoclonal antibodies can be produced by recombinant DNA 

15 techniques known in the art, for example using methods described in International Application 
No. PCT/US86/02269; European Patent Application No. 184,187; European Patent Application 
No. 171,496; European Patent Application No. 173,494; PCT International Publication No. WO 
86/01533; U.S. Patent No. 4,816,567; U.S. Pat. No. 5,225,539; European Patent Application No. 
125,023; Better, et aL, 1988. Science 240: 1041-1043; Liu, et al., 1987. Proc. Natl. Acad. Sci. 

20 USA 84: 3439-3443; Liu, et al., 1987. J. Immunol. 139: 3521-3526; Sun, et al., 1987. Proc. Natl. 
Acad. Sci. USA 84: 214-218; Nishimura, etaL, 1987. Cancer Res. 47: 999-1005; Wood, etal, 

1985. Nature 314 :446-449; Shaw, et al., 1988. J. Natl. Cancer Inst. 80: 1553-1559); 
Morrison(1985) Science 229:1202-1207; Oi, et al. (1986) BioTechniques 4:214; Jones, et aL, 

1986. Nature 321: 552-525; Verhoeyan, et al, 1988. Science 239: 1534; and Beidler, et aL, 
25 1988. /. Immunol. 141: 4053-4060. Each of the above citations are incorporated herein by 

reference in their entirety. 

In one embodiment, methods for the screening of antibodies that possess the desired 
specificity include, but are not limited to, enzyme-linked immunosorbent assay (ELISA) and 
other immunologically-mediated techniques known within the art. In a specific embodiment, 
30 selection of antibodies that are specific to a particular domain of a SECP protein is facilitated by 
generation of hybridomas that bind to the fragment of a SECP protein possessing such a domain. 
Thus, antibodies that are specific for a desired domain within a SECP protein, or derivatives, 
fragments, analogs or homologs thereof, are also provided herein. 
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Anti-SECP antibodies may be used in methods known within the art relating to the 
localization and/or quantitation of a SECP protein (e,g., for use in measuring levels of the SECP 
protein within appropriate physiological samples, for use in diagnostic methods, for use in 
imaging the protein, and the like). In a given embodiment, antibodies for SECP proteins, or 
5 derivatives, fragments, analogs or homologs thereof, that contain the antibody derived binding 
domain, are utilized as pharmacologically-active compounds (hereinafter "Therapeutics"). 

An anti-SECP antibody (e.g., monoclonal antibody) can be used to isolate a SECP 
polypeptide by standard techniques, such as affinity chromatography or immunoprecipitation. 
An anti-SECP antibody can facilitate the purification of natural SECP polypeptide from cells and 

10 of recombinaniiy-produced SECP polypeptide expressed in host ceils. Moreover, an anti-SECP 
antibody can be used to detect SECP protein (e.g., in a cellular lysate or cell supernatant) in order 
to evaluate the abundance and pattern of expression of the SECP protein. Anti-SECP antibodies 
can be used diagnostically to monitor protein levels in tissue as part of a clinical testing 
procedure, e.g., to, for example, determine the efficacy of a given treatment regimen. Detection 

15 can be facilitated by coupling (i.e., physically linking) the antibody to a detectable substance. 
Examples of detectable substances include various enzymes, prosthetic groups, fluorescent 
materials, luminescent materials; bioluminescent materials, and radioactive materials. Examples 
of suitable enzymes include horseradish peroxidase, alkaline phosphatase, P-galactosidase, or 
acetylcholinesterase; examples of suitable prosthetic group complexes include streptavidin/biotin 

20 and avidin/biotin; examples of suitable fluorescent materials include umbelliferone, fluorescein, 
fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or 
phycoerythrin; an example of a luminescent material includes luminol; examples of 
bioluminescent materials include luciferase, luciferin, and aequorin, and examples of suitable 
radioactive material include '^^I, ^^S or ^H. 

25 S£CP Recombinant Expression Vectors and Host Cells 

Another aspect of the invention pertains to vectors, preferably expression vectors, 
containing a nucleic acid encoding a SECP protein, or derivatives, fragments, analogs or 
homologs thereof. As used herein, the term "vector" refers to a nucleic acid molecule capable of 
transporting another nucleic acid to which it has been linked. One type of vector is a "plasmid", 
30 which refers to a circular double^ stranded DNA loop into which additional DNA segments can 
be ligated. Another type of vector is a viral vector, wherein additional DNA segments can be 
ligated into the viral genome. Certain vectors are capable of autonomous replication in a host 
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cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication 
and episomal mammalian vectors). Other vectors (e.g., non-episomal manmialian vectors) are 
integrated into the genome of a host cell upon introduction into the host cell, and thereby are 
replicated along with the host genome. Moreover, certain vectors are capable of directing the 
5 expression of genes to which they are operatively-linked. Such vectors are referred to herein as 
"expression vectors". In general, expression vectors of utility in recombinant DNA techniques, 
are often in the form of plasmids. In the present Specification, "plasmid" and "vector" can be 
used interchangeably, as the plasmid is the most conunonly used form of vector. However, the 
invention is intended to include such other forms of expression vectors, such as viral vectors 
10 (e.g.. replication defective retroviruses, adenoviruses and adeno-associated viruses), which serve 
equivalent functions. 

The recombinant expression vectors of the invention comprise a nucleic acid of the 
invention in a form suitable for expression of the nucleic acid in a host cell, which means that the 
' recombinant expression vectors include one or more regulatory sequences, selected on the basis 
15 of the host cells to be used for expression, that is operatively-linked to the.nucleic acid sequence 
to be expressed. Within a recombinant expression vector, "operably-linked" is intended to mean 
that the nucleotide sequence of interest is linked to the regulatory sequence(s) in a manner that 
allows for expression of the nucleotide sequence (e.g., in an in vitro transcription/translation 
system or in a host cell when the vector is introduced into the host cell). 

20 The phrase "regulatory sequence" is intended to includes promoters, enhancers and other 

expression control elements (e.g., polyadenylation signals). Such regulatory sequences are 
described, for example, in Goeddel, Gene Expression Technology: Methods in 
Enzymology 185, Academic Press, San Diego, Calif. (1990), Regulatory sequences include 
those that direct constitutive expression of a nucleotide sequence in many types of host cell and 

25 those that direct expression of the nucleotide sequence only in certain host cells (e.g., 

tissue-specific regulatory sequences). It will be appreciated by those skilled in the art that the 
design of the expression vector can depend on such factors as the choice of the host cell to be 
transformed, the level of expression of protein desired, etc. The expression vectors of the 
invention can be introduced into host cells to thereby produce proteins or peptides, including 

30 fusion proteins or peptides, encoded by nucleic acids as described herein (e.g., SECP proteins, 
mutant forms of SECP proteins, fusion proteins, etc.). 

The recombinant expression vectors of the invention can be designed for expression of 

SECP proteins in prokaryotic or eukaryotic cells. For example, SECP proteins can be expressed 
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in bacterial cells such as Escherichia coli, insect cells (using baculovirus expression vectors) 
yeast cells or mammalian cells. Suitable host cells are discussed further in Goeddel, Gene 
Expression Technology: Methods inEnzymology 185, Academic Press, San Diego, Calif. ' 
(1990). Alternatively, the recombinant expression vector can be transcribed and translated in 
5 vitro, for example using T7 promoter regulatory sequences and T7 polymerase. 

Expression of proteins in prokaryotes is most often carried out in Escherichia coli with 
vectors containing constitutive or inducible promoters directing the expression of either fusion or 
non-fusion proteins. Fusion vectors add a number of amino acids to a protein encoded therein, 
usually to the amino terminus of the recombinant protein. Such fusion vectors typically serve 

10 three purposes: (0 to increase expression of recombinant protein; {ii) to increase the solubility of 
the recombinant protein; and {iii) to aid in the purification of the recombinant protein by acting 
as a ligand in affinity purification. Often, in fusion expression vectors, a proteolytic cleavage 
site is introduced at the junction of the fusion moiety and the recombinant protein to enable 
separation of the recombinant protein from the fusion moiety subsequent to purification of the 

15 fusion protein. Such enzymes, and their cognate recognition sequences, include Factor Xa, 
thrombin, and enterokinase. Typical fusion expression vectors include pGEX (Pharmacia 
Biotech Inc; Smith and Johnson, 1988. Gene 67: 31-40), pMAL (New England Biolabs, Beverly, 
Mass.) and pRIT5 (Pharmacia, Piscataway, N.J.) that fuse glutathione S-transferase (GST), 
maltose E binding protein, or protein A, respectively, to the target recombinant protein. 

20 Examples of suitable inducible non-fusion Escherichia coli expression vectors include 

pTrc (Amrann et aL, (1988) Gene 69:301-315) and pET 1 Id (Studier, et aL, Gene Expression 
Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990) 
60-89). . 

One strategy to maximize recombinant protein expression in Escherichia coli is to 
25 express the protein in a host bacteria with an impaired capacity to proteolytically-cleave the 
recombinant protein. See, e.g., Gottesman, Gene Expression Technology: Methods in 
Enzymology 185, Academic Press, San Diego, Calif. (1990) 1 19-128. Another strategy is to 
alter the nucleic acid sequence of the nucleic acid to be inserted into an expression vector so that 
the individual codons for each amino acid are those preferentially utilized in Escherichia coli 
30 (see, e.g., Wada, et al, 1992. NucL Acids Res. 20: 21 1 1 -2 11 8). Such alteration of nucleic acid 
sequences of the invention can be carried out by standard DNA synthesis techniques. 
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In another embodiment, the SECP expression vector is a yeast expression vector. 
Examples of vectors for expression in yeast Saccharomyces cerivisae include pYepSecl 
(Baldari, et a/., 1987. EMBO J, 6: 229-234), pMFa (Kurjan and Herskowitz, 1982. Cell 30: 
933-943), pJRY88 (Schultz et ah, 1987. Gene 54: 1 13-123), pYES2 (Invitrogen Corporation, 
5 San Diego, Calif.), and picZ (InVitrogen, Corp.; San Diego, Calif.). 

Alternatively, SECP can be expressed in insect cells using baculovirus expression 
vectors. Baculovirus vectors available for expression of proteins in cultured insect cells {e.g., 
SF9 cells) include the pAc series (Smith, et a/., 1983. Mol. Cell. Biol. 3: 2156-2165) and the pVL 
series (Lucklow and Summers, 1989. Virology 170: 31-39). 

10 In yet another embodiment, a nucleic acid of the invention is expressed in mammalian 

' cells using a mammalian expression vector. Examples of mammalian expression vectors include 
pCDM8 (Seed, 1987. Nature 329: 840) and pMT2PC (Kaufman, et 1987. EMBO J. 6: 
187-195). When used in mammalian cells, the expression vector's control functions are often 
provided by viral regulatory elements. For example, commonly used promoters are derived from 

15 polyoma, adenovirus 2, cytomegalovirus, and simian virus 40 (SV 40). For other suitable 

expression systems for both prokaryotic and eukaryotic cells see, e.g.. Chapters 16 and 17 of 
Sambrook, et ah, MOLECULAR CLONING: A Laboratory Manual. 2nd ed.. Cold Spring Harbor 
Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989. 

In another embodiment, the recombinant mammalian expression vector is capable of 
20 directing expression of the nucleic acid preferentially in a particular cell type (e.g., 

tissue-specific regulatory elements are used to express the nucleic acid). Tissue-specific 
regulatory elements are known in the art. Non-limiting examples of suitable tissue-specific 
promoters include the albumin promoter (liver-specific; see, Pinkert, etal, 1987. Genes Dev. 1: 
268-277), lymphoid-specific promoters {see, Calame and Eaton, 1988. Adv. Immunol 43: 
25 235-275), in particular promoters of T cell receptors {see, Winoto and Baltimore, 1989. EMBO J, 
8: 729-733) and immunoglobulins {see, Banerji, et al, 1983. Cell 33: 729-740; Queen and 
Baltimore, 1983. Cell 33: 741-748), neuron-specific promoters (e.g., the neurofilament promoter; 
see, Byrne and Ruddle, 1989. Proc, Natl Acad. Scl USA 86: 5473-5477), pancreas-specific 
promoters {see, Edlund, et al, 1985. Science 230: 912-916), and mammary gland-specific 
30 promoters {e.g., milk whey promoter; U.S. Pat. No. 4,873,316 and European Application 

Publication No. 264,166). Developmentally-regulated promoters are also encompassed, e.g., the 
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murine hox promoters (Kessel and Gruss, 1990. Science 249: 374-379) and the a-fetoprotein 
promoter {see, Campes and Tilghman, 1989. Genes Dev. 3: 537-546). 

The invention further provides a recombinant expression vector comprising a DNA 
molecule of the invention cloned into the expression vector in an antisense orientation. That is, 
5 the DNA molecule is operatively-linked to a regulatory sequence in a manner that allows for 
expression (by transcription of the DNA molecule) of an RNA molecule that is antisense to 
SECP mRNA. Regulatory sequences operatively linked to a nucleic acid cloned in the antisense 
orientation can be chosen that direct the continuous expression of the antisense RNA molecule in 
a variety of cell types, for instance viral promoters and/or enhancers, or regulatory sequences can 

10 be chosen that direct constitutive, tissue specific or cell type specific expression of antisense 
RNA. The antisense expression vector can be in the form of a recombinant plasmid, phagemid 
or attenuated virus in which antisense nucleic acids are produced under the control of a high 
efficiency regulatory region, the activity of which can be determined by the cell type into which 
the vector is introduced. For a discussion of the regulation of gene expression using antisense 

15 genes see, e.g., Weintraub, et aL, "Antisense RNA as a molecular tool for genetic analysis," 
Reviews -Trends in Genetics, Vol. 1(1) 1986. 

Another aspect of the invention pertains to host cells into which a recombinant 
expression vector of the invention has been introduced. The terms "host cell" and "recombinant 
host cell" are used interchangeably herein. It is understood that such terms refer not only to the 
20 particular subject cell but also to the progeny or potential progeny of such a cell. Because certain 
modifications may occur in succeeding generations due to either mutation or environmental 
influences, such progeny may not, in fact, be identical to the parent cell, but are still included 
within the scope of the term as used herein. 

A host cell can be any prokaryotic or eukaryotic cell. For example, SECP protein can be 
25 expressed in bacterial cells such as Escherichia coli, insect cells, yeast or mammalian cells (such 
as Chinese hamster ovary cells (CHO) or COS cells). Other suitable host cells are known to 
those skilled in the art. 

Vector DNA can be introduced into prokaryotic or eukaryotic cells via conventional 
transformation or transfection techniques. As used herein, the terms "transformation" and 
30 "transfection" are intended to refer to a variety of art-recognized techniques for introducing 
foreign nucleic acid {e.g., DNA) into a host cell, including calcium phosphate or calcium 
chloride co-precipitation, DEAE-dextran-mediated transfection, lipofection, or electroporation. 
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Suitable methods for transforming or transfecting host cells can be found in Sambrook, et aL 
(Molecular Cloning: A Laboratory Manual. 2nd ed.. Cold Spring Harbor Laboratory, 
Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N. Y., 1989), and other laboratory 
manuals. 

5 For stable transfection of mammalian cells, it is known that, depending upon the 

expression vector and transfection technique used, only a small fraction of cells may integrate 
the foreign DNA into their genome. In order to identify and select these integrants, a gene that 
encodes a selectable marker (e,g,, resistance to antibiotics) is generally introduced into the host 
cells along with the gene of interest. Various selectable markers include those that confer 
iO resistance to drugs, such as G418, hygromycin and methotrexate. Nucleic acid encoding a 

selectable marker can be introduced into a host cell on the same vector as that encoding SECP or 
can be introduced on a separate vector. Cells stably-transfected with the introduced nucleic acid 
can be identified by drug selection (e.g., cells that have incorporated the selectable marker gene 
will survive, while the other cells die). 

15 A host cell of the invention, such as a prokaryotic or eukaryotic host cell in culture, can 

be used to produce (i.e., express) SECP protein. Accordingly, the invention further provides 
methods for producing SECP protein using the host cells of the invention. In one embodiment, 
the method comprises culturing the host cell of invention (Le„ into which a recombinant 
expression vector encoding SECP protein has been introduced) in a suitable medium such that 

20 SECP protein is produced. In another embodiment, the method further comprises isolating 
SECP protein from the medium or the host cell. 

Transgenic Animals 

The host cells of the invention can also be used to produce non-human transgenic 
animals. For example, in one embodiment, a host cell of the invention is a fertilized oocyte or an 

25 embryonic stem cell into which SECP protein-coding sequences have been introduced. These 
host cells can then be used to create non-human transgenic animals in which exogenous SECP 
sequences have been introduced into their genome or homologous recombinant animals in which 
endogenous SECP sequences have been altered. Such animals are useful for studying the 
function and/or activity of SECP protein and for identifying and/or evaluating modulators of 

30 SECP protein activity. As used herein, a "transgenic animal" is a non-human animal, preferably 
a mammal, more preferably a rodent such as a rat or mouse, in which one or more of the cells of 
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the animal includes a transgene. Other examples of transgenic animals include non-human 
primates, sheep, dogs, cows, goats, chickens, amphibians, etc. 

A transgene is exogenous DNA that is integrated into the genome of a cell from which a 
transgenic animal develops and that remains in the genome of the mature animal, thereby 
5 directing the expression of an encoded gene product in one or more cell types or tissues of the 
transgenic animal. As used herein, a "homologous recombinant animal" is a non-human animal, 
preferably a mammal, more preferably a mouse, in which an endogenous SECP gene has been 
altered by homologous recombination between the endogenous gene and an exogenous DNA 
molecule introduced into a cell of the animal, e.g., an embryonic cell of the animal, prior to 
10 development of the animal. 

A transgenic animal of the invention can be created by introducing SECP-encoding 
nucleic acid into the male pronuclei of a fertilized oocyte (e.g., by micro-injection, retroviral 
infection) and allowing the oocyte to develop in a pseudopregnant female foster animal. The 
humanSECPcDNAsequencesof SEQ ID NO:l, 3,5,7,9, 11, 13, 15, 17,40,42,44,46,48,50, 

15 52, 54 and/or 56 can be introduced as a transgene into the genome of a non-human animal. 

Alternatively, a non-human homologue of the human SECP gene, such as a mouse SECP gene, 
can be isolated based on hybridization to the human SECP cDNA (described further supra) and 
used as a transgene. Intronic sequences and polyadenylation signals can also be included in the 
transgene to increase the efficiency of expression of the transgene. A tissue-specific regulatory 

20 ' sequence(s) can be operably-linked to the SECP transgene to direct expression of SECP protein 
to particular cells. Methods for generating transgenic animals via embryo manipulation and 
micro-injection, particularly animals such as mice, have become conventional in the art and are 
described, for example, in U.S. Patent Nos. 4,736,866; 4,870,009; and 4,873,191; and Hogan, 
1986. In: MANIPULATING THE Mouse Embryo, Cold Spring Harbor Laboratory Press, Cold 

25 Spring Harbor, N.Y. Similar methods are used for production of other transgenic animals. A 

transgenic founder animal can be identified based upon the presence of the SECP transgene in its 
genome and/or expression of SECP mRNA in tissues or cells of the animals. A transgenic 
founder animal can then be used to breed additional animals carrying the transgene. Moreover, 
transgenic animals carrying a transgene-encoding SECP protein can further be bred to other 

30 transgenic animals carrying other transgenes. 

To create a homologous recombinant animal, a vector is prepared which contains at least 
a portion of a SECP gene into which a deletion, addition or substitution has been introduced to 
thereby alter, e.g., functionally disrupt, the SECP gene. The SECP gene can be a human gene 



(eg.,thecDNAof SEQIDNO:l,3, 5, 7, 9, 11, 13, 15, 17,40, 42,44,46, 48, 50, 52, 54 and 56), 
biit more preferably, is a non-human homologue of a human SECP gene. For example, a mouse 
homologue of human SECP gene of SEiQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 
50, 52, 54 and 56 can be used to construct a homologous recombination vector suitable for 
5 altering an endogenous SECP gene in the mouse genome. In one embodiment, the vector is 

designed such that, upon homologous recombination, the endogenous SECP gene is functionally 
disrupted (i.^., no longer encodes a functional protein; also referred to as a "knock out" vector). 

Alternatively, the vector can be designed such that, upon homologous recombination, the 
endogenous SECP gene is mutated or otherwise altered but still encodes functional protein {e,g., 

10 the upstream regulatory region can be altered to thereby alter the expression of the endogenous 
SECP protein). In the homologous recombination vector, the altered portion of the SECP gene is 
flanked at its 5'- and 3*-termini by additional nucleic acid of the SECP gene to allow for 
homologous recombination to occur between the exogenous SECP gene carried by the vector 
and an endogenous SECP gene in an embryonic stem cell. The additional flanking SECP nucleic 

15 acid is of sufficient length for successful homologous recombination with the endogenous gene. 
Typically, several kilobases (Kb) of flanking DNA (both at the 5'- and 3'-termini) are included in 
the vector. See, e.g., Thomas, et aL, 1987. Cell 51: 503 for a description of homologous 
recombination vectors. The vector is ten introduced into an embryonic stem cell line {e.g., by 
electroporation) and cells in which the introduced SECP gene has homologously-recombined 

20 with the endogenous SECP gene are selected. See, e.g., Li, et al., 1992. Cell 69: 915. 

The selected cells are then micro-injected into a blastocyst of an animal (e.g., a mouse) to 
form aggregation chimeras. See, e.g., Bradley, 1987. In: Teratocarcinomas and Embryonic 
Stem Cells: A Practical Approach, Robertson, ed. IRL, Oxford, pp. 1 13-152. A chimeric 
embryo can then be implanted into a suitable pseudopregnant female foster animal and the 

25 embryo brought to term. Progeny harboring the homologously-recombined DNA in their germ 
cells can be used to breed animals in which all cells of the animal contain the homologously- 
recombined DNA by germline transmission of the transgene. Methods for constructing 
homologous recombination vectors and homologous recombinant animals are described further 
in Bradley, 1991. Curr. Opin. Biotechnol. 2: 823-829; PCT International Publication Nos.: WO 

30 90/11354; WO 91/01140; WO 92/0968; and WO 93/04169. 

In another embodiment, transgenic non-human animals can be produced that contain 

selected systems that allow for regulated expression of the transgene. One example of such a 

system is the cre/loxP recombinase system of bacteriophage PI. For a description of the cre/loxP 
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recombinase system, ^.g., Lakso, et al., 1992. Proc. Natl Acad. ScL USA 89: 6232-6236. 
Another example of a recombinase system is the FLP recombinase system of Saccharomyces 
cerevisiae. See, O'Gorman, et al,, 1991. Science 251:1351-1355. If a cre/loxP recombinase 
system is used to regulate expression of the transgene, animals containing transgenes encoding 
5 both the Cre recombinase and a selected protein are required. Such animals can be provided 
through the construction of "double" transgenic animals, e.g., by mating two transgenic animals, 
one containing a transgene encoding a selected protein and the other containing a transgene 
encoding a recombinase. 

Clones of the non-human transgenic animals described herein can also be produced 
10 according to the methods described in Wilmut, ei al., 1997. Naiure 385: 810-813. In brief, a ceil 
(e.g., a somatic cell) from the transgenic animal can be isolated and induced to exit the growth 
cycle and enter Go phase. The quiescent cell can then be fused, e.g.y through the use of electrical 
pulses, to an enucleated oocyte from an animal of the same species from which the quiescent cell 
is isolated. The reconstructed oocyte is then cultured such that it develops to morula or 
15 blastocyte and then transferred to pseudopregnant female foster animal. The offspring borne of 
this female foster animal will be a clone of the animal from which the cell {e.g., the somatic cell) 
is isolated. 

Pharmaceutical Compositions 

The SECP nucleic acid molecules, SECP proteins, and anti-SECP antibodies (also 
20 referred to herein as "active compounds") of the invention, and derivatives, fragments, analogs 
and homologs thereof, can be incorporated into pharmaceutical compositions suitable for 
administration. Such compositions typically comprise the nucleic acid molecule, protein, or 
antibody and a pharmaceutically-acceptable carrier. As used herein, "pharmaceutically- 
acceptable carrier" is intended to include any and all solvents, dispersion media, coatings, 
25 antibacterial and antifungal agents, isotonic and absorption delaying agents, and the like, 
compatible with pharmaceutical administration. Suitable carriers are described in the most 
recent edition of Remington's Pharmaceutical Sciences, a standard reference text in the field, 
which is incorporated herein by reference. Preferred examples of such carriers or diluents 
include, but are not limited to, water, saline, finger's solutions, dextrose solution, and 5% human 
30 serum albumin. Liposomes and other non-aqueous {i.e., lipophilic) vehicles such as fixed oils 
may also be used. The use of such media and agents for pharmaceutically active substances is 
well known in the art. Except insofar as any conventional media or agent is incompatible with 
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the active compound, use thereof in the compositions is contemplated. Supplementary active 
compounds can also be incorporated into the compositions. 

A pharmaceutical composition of the invention is formulated to be compatible with its 
intended route of administration. Examples of routes of administration include parenteral, e,g,, 
5 intravenous, intradermal, subcutaneous, oral ie,g., inhalation), transdermal (i.e., topical), 
transmucosal, and rectal administration. Solutions or suspensions used for parenteral, 
intradermal, or subcutaneous application can include the following components: a sterile diluent 
such as water for injection, saline solution, fixed oils, polyethylene glycols, glycerine, propylene 
glycol or other synthetic solvents; antibacterial agents such as benzyl alcohol or methyl 

10 parabens; antioxidants such as ascorbic acid or sodium bisulfite; chelating agents such as 

ethylenediaminetetraacetic acid (EDTA); buffers such as acetates, citrates or phosphates, and 
agents for the adjustment of tonicity such as sodium chloride or dextrose. The pH can be 
adjusted with acids or bases, such as hydrochloric acid or sodium hydroxide. The parenteral 
preparation can be enclosed in ampoules, disposable syringes of multiple dose vials made of 

15 glass or plastic. 

Pharmaceutical compositions suitable for injectable use include sterile aqueous solutions 
(where water soluble) or dispersions and sterile powders for the extemporaneous preparation of 
sterile injectable solutions or dispersion. For intravenous administration, suitable carriers 
include physiological saline, bacteriostatic water, Cremophor EL " (BASF, Parsippany, N.J.) or 

20 phosphate buffered saline (PBS). In all cases, the composition must be sterile and should be 
fluid to the extent that easy syringeability exists. It must be stable under the conditions of 
manufacture and storage and must be preserved against the contaminating action of 
microorganisms such as bacteria and fungi. The carrier can be a solvent or dispersion medium 
containing, for example, water, ethanol, polyol (for example, glycerol, propylene glycol, and 

25 liquid polyethylene glycol, and the like), and suitable mixtures thereof. The proper fluidity can 
be maintained, for example, by the use of a coating such as lecithin, by the maintenance of the 
required particle size in the case of dispersion and by the use of surfactants. Prevention of the 
action of microorganisms can be achieved by various antibacterial and antifungal agents, for 
example, parabens, chlorobutanol, phenol, ascorbic acid, thimerosal, and the like. In many 

30 cases, it will be preferable to include isotonic agents, for example, sugars, polyalcohols such as 
manitol, sorbitol, sodium chloride in the composition. Prolonged absorption of the injectable 
compositions can be brought about by including in the composition an agent which delays 
absorption, for example, aluminum monostearate and gelatin. 
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Sterile injectable solutions can be prepared by incorporating the active compound (e.g,^ a 
SECP protein or anti-SECP antibody) in the required amount in an appropriate solvent with one 
or a combination of ingredients enumerated above, as required, followed by filtered sterilization. 
Generally, dispersions are prepared by incorporating the active compound into a sterile vehicle 
5 that contains a basic dispersion medium and the required other ingredients from those 
enumerated above. In the case of sterile powders for the preparation of sterile injectable 
solutions, methods of preparation are vacuum drying and freeze-drying that yields a powder of 
the active ingredient plus any additional desired ingredient from a previously sterile-filtered 
solution thereof. 

iO Oral compositions generaiiy include an inen diluent or an edible carrier. They can be 

enclosed in gelatin capsules or compressed into tablets. For the purpose of oral therapeutic 
administration, the active compound can be incorporated with excipients and used in the form of 
tablets, troches, or capsules. Oral compositions can also be prepared using a fluid carrier for use 
as a mouthwash, wherein the compound in the fluid carrier is applied orally and swished and 

15 expectorated or swallowed.' Pharmaceutically compatible binding agents, and/or adjuvant 

materials can be included as part of the composition. The tablets, pills, capsules, troches and the 
like can contain any of the following ingredients, or compounds of a similar nature: a binder 
such as microcrystalline cellulose, gum tragacanth or gelatin; an excipient such as starch or 
lactose, a disintegrating agent such as alginic acid, Primogel, or com starch; a lubricant such as 

20 magnesium stearate or Sterotes; a glidant such as colloidal silicon dioxide; a sweetening agent 
such as sucrose or saccharin; or a flavoring agent such as peppermint, methyl salicylate, or 
orange flavoring. 

For administration by inhalation, the compounds are delivered in the form of an aerosol 
spray from pressured container or dispenser which contains a suitable propellant, e.g., a gas such 
25 as carbon dioxide, or a nebulizer. 

Systemic administration can also be by transmucosal or transdermal means. For 
transmucosal or transdermal administration, penetrants appropriate to the barrier to be permeated 
are used in the formulation. Such penetrants are generally known in the art, and include, for 
example, for transmucosal administration, detergents, bile salts, and fusidic acid derivatives. 
30 Transmucosal administration can be accomplished through the use of nasal sprays or 

suppositories. For transdermal administration, the active compounds are formulated into 
ointments, salves, gels, or creams as generally known in the art. 
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The compounds can also be prepared in the form of suppositories (^.g., with conventional 
suppository bases such as cocoa butter and other glycerides) or retention enemas for rectal 
delivery. 

In one embodiment, the active compounds are prepared with carriers that will protect the 
5 compound against rapid elimination from the body, such as a controlled release formulation, 
including implants and microencapsulated delivery systems. Biodegradable, biocompatible 
polymers can be used, such as ethylene vinyl acetate, polyanhydrides, polyglycolic acid, 
collagen, polyorthoesters, and polylactic acid. Methods for preparation of such formulations will 
be apparent to those skilled in the art. The materials can also be obtained commercially from 
iO Alza Corporation and Nova Pharmaceuticals, Inc. Liposomal suspensions (including liposomes 
targeted to infected cells with monoclonal antibodies to viral antigens) can also be used as 
pharmaceutically acceptable carriers. These can be prepared according to methods known to 
those skilled in the art, for example, as described in U.S. Patent No. 4,522,8 11. 

It is especially advantageous to formulate oral or parenteral compositions in dosage unit 
15 form for ease of administration and uniformity of dosage. Dosage unit form as used herein refers 
to physically discrete units suited as unitary dosages for the subject to be treated; each unit 
containing a predetermined quantity of active compound calculated to produce the desired 
therapeutic effect in association with the required pharmaceutical carrier. The specification for 
the dosage unit forms of the invention are dictated by and directly dependent on the unique 
20 characteristics of the active compound and the particular therapeutic effect to be achieved, and 
the limitations inherent in the art of compounding such an active compound for the treatment of 
individuals. 

The nucleic acid molecules of the invention can be inserted into vectors and used as gene 
therapy vectors. Gene therapy vectors can be delivered to a subject by, for example, intravenous 

25 injection, local administration (see, e.g., U.S. Patent No. 5,328,470) or by stereotactic injection 
(see, e.g., Chen, et al, 1994. Proc. Natl. Acad. Sci. USA 91: 3054-3057). The pharmaceutical 
preparation of the gene therapy vector can include the gene therapy vector in an acceptable 
diluent, or can comprise a slow release matrix in which the gene delivery vehicle is imbedded. 
Alternatively, where the complete gene delivery vector can be produced intact from recombinant 

30 cells, e.g., retroviral vectors, the pharmaceutical preparation can include one or more cells that 
produce the gene delivery system. 
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The pharmaceutical compositions can be included in a container, pack, or dispenser 
together with instructions for administration. 

Screening and Detection Methods 

The nucleic acid molecules, proteins, protein homologues, and antibodies described 
5 herein can be used in one or more of the following methods: (A) screening assays; (B) detection 
assays (e.g., chromosomal mapping, cell and tissue typing, forensic biology), (C) predictive 
medicine (er.g., diagnostic assays, prognostic assays, monitoring clinical trials, and 
pharmacogenomics); and (D) methods of treatment (e.g., therapeutic and prophylactic). 

The isolated nucleic acid molecules of the present invention can be used to express SECP 
10 protein (e.g., via a recombinant expression vector in a host cell in gene therapy applications), to 
detect SECP mRNA (e.g., in a biological sample) or a genetic lesion in an SECP gene, and to 
modulate SECP activity, as described further below. In addition, the SECP proteins can be used 
to screen drugs or compounds that modulate the SECP protein activity or expression as well as to 
treat disorders characterized by insufficient or excessive production of SECP protein or 
15 production of SECP protein forms that have decreased or aberrant activity compared to SECP 
wild-type protein. In addition, the anti-SECP antibodies of the present invention can be used to 
detect and isolate SECP proteins and modulate SECP activity. 

The invention further pertains to novel agents identified by the screening assays 
described herein and uses thereof for treatments as previously described. 

. 20 Screening Assays 

The invention provides a method (also referred to herein as a "screening assay") for 
identifying modulators, i.e., candidate or test compounds or agents (e.g., peptides, 
peptidomimetics, small molecules or other drugs) that bind to SECP proteins or have a 
stimulatory or inhibitory effect on, e.g., SECP protein expression or SECP protein activity. The 
25 invention also includes compounds identified in the screening assays described herein. 

In one embodiment, the invention provides assays for screening candidate or test 

compounds which bind to or modulate the activity of the membrane-bound form of a SECP 

protein, or polypeptide or biologically-active portion thereof. The test compounds of the 

invention can be obtained using any of the numerous approaches in combinatorial library 

30 methods known in the art, including: biological libraries; spatially addressable parallel solid 

phase or solution phase libraries; synthetic library methods requiring deconvolution; the 

"one-bead one-compound" library method; and synthetic library methods using affinity 
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chromatography selection. The biological library approach is limited to peptide libraries, while 
the other four approaches are applicable to peptide, non-peptide oligomer or small molecule 
libraries of compounds. See, e,g.. Lam, 1997 . Anticancer Drug Design 12: 145. 

A "small molecule" as used herein, is meant to refer to a composition that has a molecular 
5 weight of less than about 5 kD and most preferably less than about 4 kD. Small molecules can 
be, e.g., nucleic acids, peptides, polypeptides, peptidomimetics, carbohydrates, lipids or other 
organic or inorganic molecules. Libraries of chemical and/or biological mixtures, such as fungal, 
bacterial, or algal extracts, are known in the art and can be screened with any of the assays of the 
invention. 

10 Examples of methods for the synthesis of molecular libraries can be found in the art, for 

example in: DeWitt, et aL, 1993. Proc. Natl. Acad. Sci. U.S.A. 90: 6909; Erb, et al, 1994. Proc. 
Natl Acad. Sci. U.S.A. 91: 1 1422; Zuckermann, et aL, 1994. J. Med. Chem. 37: 2678; Cho, et al, 
1993. Science 261: 1303; Carrell, et ah, 1994. Angew. Chem. Int. Ed. Engl. 33: 2059; Carell, et 
«/., 1994. Angew. Chem. Int. Ed. Engl 33: 2061; and Gallop, et ah, 1994. J. Med, Chem. 37: 

15 1233. 

Libraries of compounds may be presented in solution {e.g.^ Houghten, 1992. 
Biotechniques 13: 412-421), or on beads (Lam, 1991. Nature 354: 82-84), on chips (Fodor, 1993. 
Nature 364: 555-556), bacteria (Ladner, U.S. Patent No. 5,223,409), spores (Ladner, U.S. Patent 
5,233,409), plasmids (Cull, et al., 1992. Proc. Natl. Acad. Sci. USA 89: 1865-1869) or on phage 
20 (Scott and Smith, 1990. Science 249: 386-390; Devlin, 1990. Science 249: 404-406; Cwirla, et 
al, 1990. Proc. Natl. Acad. Sci. U.S.A. 87: 6378-6382; Felici, 1991. /. Mol. Biol. 222: 301-310; 
Ladner, U.S. Patent No. 5,233,409.). 

In one embodiment, an assay is a cell-based assay in which a cell which expresses a 
membrane-bound form of SECP protein, or a biologically-active portion thereof, on the cell 

25 surface is contacted with a test compound and the ability of the test compound to bind to a SECP 
protein determined. The cell, for example, can of manmialian origin or a yeast cell. 
Determining the ability of the test compound to bind to the SECP protein can be accomplished, 
for example, by coupling the test compound with a radioisotope or enzymatic label such that 
binding of the test compound to the SECP protein or biologically-active portion thereof can be 

30 determined by detecting the labeled compound in a complex. For example, test compounds can 
be labeled with ^^I, ^^S, ^'*C, or ^H, either directly or indirectly, and the radioisotope detected by 
direct counting of radioemission or by scintillation counting. Alternatively, test compounds can 
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be enzymatically-labeled with, for example, horseradish peroxidase, alkaUne phosphatase, or 
luciferase, and the enzymatic label detected by determination of conversion of an appropriate 
substrate to product. In one embodiment, the assay comprises contacting a cell which expresses 
a membrane-bound form of SECP protein, or a biologically-active portion thereof, on the cell 
5 surface with a known compound which binds SECP to form an assay mixture, contacting the 
assay mixture with a test compound, and determining the ability of the test compound to interact 
with a SECP protein, wherein determining the ability of the test compound to interact with a 
SECP protein comprises determining the ability of the test compound to preferentially bind to 
SECP protein or a biologically-active portion thereof as compared to the known compound. 

10 In another embodiment, an assay is a ceii-based assay comprising contacting a cell 

expressing a membrane-bound form of SECP protein, or a biologically-active portion thereof, on 
the cell surface with a test compound and determining the ability of the test compound to 
modulate (e.g., stimulate or inhibit) the activity of the SECP protein or biologically-active 
portion thereof. Determining the ability of the test compound to modulate the activity of SECP 

15 or a biologically-active portion thereof can be accomplished, for example, by determining the 

ability of the SECP protein to bind to or interact with a SECP target molecule. As used herein, a 
"target molecule" is a molecule with which a SECP protein binds or interacts in nature, for 
example, a molecule on the surface of a cell which expresses a SECP interacting protein, a 
molecule on the surface of a second cell, a molecule in the extracellular milieu, a molecule 

20 associated with the internal surface of a cell membrane or a cytoplasmic molecule. An SECP 
target molecule can be a non-SECP molecule or a SECP protein or polypeptide of the invention. 
In one embodiment, a SECP target molecule is a component of a signal transduction pathway 
that facilitates transduction of an extracellular signal (e.g. a signal generated by binding of a 
compound to a membrane-bound SECP molecule) through the cell membrane and into the cell. 

25 The target, for example, can be a second intercellular protein that has catalytic activity or a 
protein that facilitates the association of downstream signaling molecules with SECP. 

Determining the ability of the SECP protein to bind to or interact with a SECP target 
molecule can be accomplished by one of the methods described above for determining direct 
binding. In one embodiment, determining the ability of the SECP protein to bind to or interact 
30 with a SECP target molecule can be accomplished by determining the activity of the target 
molecule. For example, the activity of the target molecule can be determined by detecting 
induction of a cellular second messenger of the target (i.e. intracellular Ca^"^, diacylglycerol, IP3, 
etc.), detecting catalytic/enzymatic activity of the target an appropriate substrate, detecting the 
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induction of a reporter gene (comprising a SECP-responsive regulatory element operatively 
linked to a nucleic acid encoding a detectable marker, e.g., luciferase), or detecting a cellular 
response, for example, cell survival, cellular differentiation, or cell proliferation. 

In yet another embodiment, an assay of the invention is a cell-free assay comprising 
5 contacting a SECP protein or biologically-active portion thereof with a test compound and 

determining the ability of the test compound to bind to the SECP protein or biologically-active 
portion thereof. Binding of the test compound to the SECP protein can be determined either 
directly or indirectly as described above. In one such embodiment, the assay comprises 
contacting the SECP protein or biologically-active portion thereof with a known compound 
10 which binds SECP lo form an assay mixture, contacting the assay mixture with a test compound, 
and determining the ability of the test compound to interact with a SECP protein, wherein 
determining the ability of the test compound to interact with a SECP protein comprises 
determining the ability of the test compound to preferentially bind to SECP or biologically-active 
portion thereof as compared to the known compound. 

15 In still another embodiment, an assay is a cell-free assay comprising contacting SECP 

protein or biologically-active portion thereof with a test compound and determining the ability of 
the test compound to modulate (e.g. stimulate or inhibit) the activity of the SECP protein or 
biologically-active portion thereof. Determining the ability of the test compound to modulate the 
activity of SECP can be accomplished, for example, by determining the ability of the SECP 

20 protein to bind to a SECP target molecule by one of the methods described above for 

determining direct binding. In an alternative embodiment, determining the ability of the test 
compound to modulate the activity of SECP protein can be accomplished by determining the 
ability of the SECP protein further modulate a SECP target molecule. For example, the 
catalytic/enzymatic activity of the target molecule on an appropriate substrate can be determined 

25 as described, supra. 

In yet another embodiment, the cell-free assay comprises contacting the SECP protein or 
biologically-active portion thereof with a known compound which binds SECP protein to form 
an assay mixture, contacting the assay mixture with a test compound, and determining the ability 
of the test compound to interact with a SECP protein, wherein determining the ability of the test 
30 compound to interact with a SECP protein comprises determining the ability of the SECP protein 
to preferentially bind to or modulate the activity of a SECP target molecule. 



184 



;;iL mI ./ ^'■il-Fiijir" i^k - in ill ii3 IrVwF' 



The cell-free assays of the invention are amenable to use of both the soluble form or the 
membrane-bound form of SECP protein. In the case of cell-free assays comprising the 
membrane-bound form of SECP protein, it may be desirable to utilize a solubilizing agent such 
that the membrane-bound form of SECP protein is maintained in solution. Examples of such 
5 solubilizing agents include non-ionic detergents such as n-octylglucoside, n-dodecylglucoside, 
n-dodecylmaltoside, octanoyl-N-methylglucamide, decanoyl-N-methylglucamide, Triton® 
X-100, Triton® X-1 14, Thesit®, Isotridecypoly(ethylene glycol ether)n, N-dodecyl- 
N,N-dimethyl-3-ammonio-l -propane sulfonate, 3-(3-cholamidopropyl) dimethylamminiol- 
1 -propane sulfonate (CHAPS), or 3-(3-ch6Iamidopropyl)dimethylamminiol-2-hydroxy- 
10 1-propane sulfonate (CHAPSO). 

In more than one embodiment of the above assay methods of the invention, it may be 
desirable to immobilize either SECP protein or its target molecule to facilitate separation of 
complexed from uncomplexed forms of one or both of the proteins, as well as to accommodate 
automation of the assay. Binding of a test compound to SECP protein, or interaction of SECP 

15 protein with a target molecule in the presence and absence of a candidate compound, can be 
accomplished in any vessel suitable for containing the reactants. Examples of such vessels 
include microtiter plates, test tubes, and micro-centrifuge tubes. In one embodiment, a fusion 
protein can be provided that adds a domain that allows one or both of the proteins to be bound to 
a matrix. For example, GST-SECP fusion proteins or GST-target fusion proteins can be 

20 adsorbed onto glutathione sepharose beads (Sigma Chemical, St. Louis, MO) or glutathione 
derivatized microtiter plates, that are then combined with the test compound or the test 
compound and either the non-adsorbed target protein or SECP protein, and the mixture is 
incubated under conditions conducive to complex formation (e.g., at physiological conditions for 
salt and pH). Following incubation, the beads or microtiter plate wells are washed to remove any 

25 unbound components, the matrix immobilized in the case of beads, complex determined either 
direcdy or indirectly, for example, as described, supra. Alternatively, the complexes can be 
dissociated from the matrix, and the level of SECP protein binding or activity determined using 
standard techniques. 

Other techniques for immobilizing proteins on matrices can also be used in the screening 
30 assays of the invention. For example, either the SECP protein or its target molecule can be 

immobilized utilizing conjugation of biotin and streptavidin. Biotinylated SECP protein or target 
molecules can be prepared from biotin-NHS (N-hydroxy-succinimide) using techniques well- 
known within the art (e.g., biotinylation kit. Pierce Chemicals, Rockford, 111.), and immobilized 
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in the wells of streptavidin-coated 96 well plates (Pierce Chemical). Alternatively, antibodies 
reactive with SECP protein or target molecules, but which do not interfere with binding of the 
SEC? protein to its target molecule, can be derivatized to the wells of the plate, and unbound 
target or SECP protein trapped in the wells by antibody conjugation. Methods for detecting such 
5 complexes, in addition to those described above for the GST-immobilized complexes, include 
immunodetection of complexes using antibodies reactive with the SECP protein or target 
molecule, as well as enzyme-linked assays that rely on detecting an enzymatic activity associated 
with the SECP protein or target molecule. 

In another embodiment, modulators of SECP protein expression are identified in a 
10 method wherein a cell is contacted with a candidate compound and the expression of SECP 

mRNA or protein in the cell is determined. The level of expression of SECP mRNA or protein 
in the presence of the candidate compound is compared to the level of expression of SECP 
mRNA or protein in the absence of the candidate compound. The candidate compound can then 
be identified as a modulator of SECP mRNA or protein expression based upon this comparison. 
15 For example, when expression of SECP mRNA or protein is greater (i.e., statistically 
significantly greater) in the presence of the candidate compound than in its absence, the 
candidate compound is identified as a stimulator of SECP mRNA or protein expression. 
Alternatively, when expression of SECP mRNA or protein is less (statistically significantly less) 
in the presence of the candidate compound than in its absence, the candidate compound is 
20 identified as an inhibitor of SECP mRNA or protein expression. The level of SECP mRNA or 
protein expression in the cells can be determined by methods described herein for detecting 
SECP mRNA or protein. 

In yet another aspect of the invention, the SECP proteins can be used as "bait proteins" in 
a two-hybrid assay or three hybrid assay (see, e.g., U.S. Patent No. 5,283,317; Zervos, et aL, 

25 1993. Cell 72: 223-232; Madura, et al., 1993. J. Biol Chem. 268: 12046-12054; Bartel, et aL, 
1993. Biotechniques 14: 920-924; Iwabuchi, et al., 1993. Oncogene 8: 1693-1696; and Brent 
WO 94/10300), to identify other proteins that bind to or interact with SECP ("SECP-binding 
proteins" or "SECP-bp") and modulate SECP activity. Such SECP-binding proteins are also 
likely to be involved in the propagation of signals by the SECP proteins as, for example, 

30 upstream or downstream elements of the SECP pathway. 

The two-hybrid system is based on the modular nature of most transcription factors, 

which consist of separable DNA-binding and activation domains. Briefly, the assay utilizes two 

different DNA constructs. In one construct, the gene that codes for SECP is fused to a gene 
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encoding the DNA binding domain of a known transcription factor (e.g., GAL-4). In the other 
construct, a DNA sequence, from a library of DNA sequences, that encodes an unidentified 
protein ("prey" or "sample") is fused to a gene that codes for the activation domain of the known 
transcription factor. If the "bait" and the "prey" proteins are able to interact, in vivo, forming a 
5 SECP-dependent complex, the DNA-binding and activation domains of the transcription factor 
are brought into close proximity. This proximity allows transcription of a reporter gene (e.g., 
LacZ) that is operably linked to a transcriptional regulatory site responsive to the transcription 
factor. Expression of the reporter gene can be detected and cell colonies containing the 
functional transcription factor can be isolated and used to obtain the cloned gene that encodes the 
10 protein which interacts with SECR 

The invention further pertains to novel agents identified by the aforementioned screening 
assays and uses thereof for treatments as described herein. 

Detection Assays 

Portions or fragments of the cDNA sequences identified herein (and the corresponding 
15 complete gene sequences) can be used in numerous ways as polynucleotide reagents. By way of 
example, and not of limitation, these sequences can be used to: (/) map their respective genes on 
a chromosome; and, thus, locate gene regions associated with genetic disease; (ii) identify an 
individual from a minute biological sample (tissue typing); and (Hi) aid in forensic identification 
of a biological sample. Some of these applications are described in the subsections below. 

20 Chromosome Mapping 

Once the sequence (or a portion of the sequence) of a gene has been isolated, this 
sequence can be used to map the location of the gene on a chromosome. This process is called 
chromosome mapping. Accordingly, portions or fragments of the SECP sequences shown in 
SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56, or fragments or 

25 derivatives thereof, can be used to map the location of the SECP genes, respectively, on a 

chromosome. The mapping of the SECP sequences to chromosomes is an important first step in 
correlating these sequences with genes associated with disease. 

Briefly, SECP genes can be mapped to chromosomes by preparing PCR primers 

(preferably 15-25 bp in length) from the SECP sequences. Computer analysis of the SECP, 

30 sequences can be used to rapidly select primers that do not span more than one exon in the 

genomic DNA, thus complicating the amplification process. These primers can then be used for 

PCR screening of somatic cell hybrids containing individual human chromosomes. Only those 
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hybrids containing the human gene corresponding to the SECP sequences will yield an amplified 
fragment. 

Somatic cell hybrids are prepared by fusing somatic cells from different mammals (e,g.y 
human and mouse cells). As hybrids of human and mouse cells grow and divide, they gradually 
5 lose human chromosomes in random order, but retain the mouse chromosomes. By using media 
in which mouse cells cannot grow, because they lack a particular enzyme, but in which human 
cells can, the one human chromosome that contains the gene encoding the needed enzyme will 
be retained. By using various media, panels of hybrid cell lines can be established. Each cell 
line in a panel contains either a single human chromosome or a small number of human 
10 chromosomes, and a full set of mouse chromosomes, allowing easy mapping of individual genes 
to specific human chromosomes. See, e.g,, D'Eustachio, et aL, 1983. Science 220: 919-924. 
Somatic cell hybrids containing only fragments of human chromosomes can also be produced by 
using human chromosomes with translocations and deletions. 

PGR mapping of somatic cell hybrids is a rapid procedure for assigning a particular 
15 sequence to a particular chromosome. Three or more sequences can be assigned per day using a 
single thermal cycler. Using the SECP sequences to design oligonucleotide primers, sub- , 
localization can be achieved with panels of fragments from specific chromosomes. 

Fluorescence in situ hybridization (FISH) of a DNA sequence to a metaphase 
chromosomal spread can further be used to provide a precise chromosomal location in one step. 

20 Chromosome spreads can be made using cells whose division has been blocked in metaphase by 
a chemical like colcemid that disrupts the mitotic spindle. The chromosomes can be treated 
briefly with trypsin, and then stained with Giemsa. A pattern of light and dark bands develops 
on each chromosome, so that the chromosomes can be identified individually. The FISH 
technique can be used with a DNA sequence as short as 500 or 600 bases. However, clones 

25 larger than 1,000 bases have a higher likelihood of binding to a unique chromosomal location 

with sufficient signal intensity for simple detection. Preferably 1,000 bases, and more preferably 
2,000 bases, will suffice to get good results at a reasonable amount of time. For a review of this 
technique, see, Verma, et al.. Human Chromosomes: A Manual of Basic Techniques 
(Pergamon Press, New York 1988). 

30 Reagents for chromosome mapping can be used individually to mark a single 

chromosome or a single site on that chromosome, or panels of reagents can be used for marking 
multiple sites and/or multiple chromosomes. Reagents corresponding to non-coding regions of 
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the genes actually are preferred for mapping purposes. Coding sequences are more likely to be 
conserved within gene families, thus increasing the chance of cross hybridizations during 
chromosomal mapping. 

Once a sequence has been mapped to a precise chromosomal location, the physical 
5 position of the sequence on the chromosome can be correlated with genetic map data. Such data 
are found, e.g., in McKusick, Mendelian Inheritance in Man, available on-line through Johns 
Hopkins University Welch Medical Library). The relationship between genes and disease, 
mapped to the same chromosomal region, can then be identified through linkage analysis 
(co-inheritance of physically adjacent genes), described in, e.g., Egeland, et aL, 1987. Nature^ 
10 325:783-787. 

Additionally, differences in the DNA sequences between individuals affected and 
unaffected with a disease associated with the SECP gene, can be determined. If a mutation is 
observed in some or all of the affected individuals but not in any unaffected individuals, then the 
mutation is likely to be the causative agent of the particular disease. Comparison of affected and 
15 unaffected individuals generally involves first looking for structural alterations in the 

chromosomes, such as deletions or translocations that are visible from chromosome spreads or 
detectable using PCR based on that DNA sequence. Ultimately, complete sequencing of genes 
from several individuals can be performed to confirm the presence of a mutation and to 
distinguish mutations from polymorphisms. 

20 Tissue Typing 

The SECP sequences of the invention can also be used to identify individuals from 
minute biological samples. In this technique, an individual's genomic DNA is digested with one 
or more restriction enzymes, and probed on a Southern blot to yield unique bands for 
identification. The sequences of the invention are useful as additional DNA markers for RFLP 

25 ("restriction fragment length polymorphisms,'* as described in U.S. Patent No. 5,272,057). 

Furthermore, the sequences of the invention can be used to provide an alternative 
technique that determines the actual base-by-base DNA sequence of selected portions of an 
individual's genome. Thus, the SECP sequences described herein can be used to prepare two 
PCR primers from the 5'- and 3 -termini of the sequences. These primers can then be used to 
30 amplify an individual's DNA and subsequently sequence it. 

Panels of corresponding DNA sequences from individuals, prepared in this manner, can 

provide unique individual identifications, as each individual will have a unique set of such DNA 
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sequences due to allelic differences. The sequences of the invention can be used to obtain such 
identification sequences from individuals and from tissue. The SECP sequences of the invention 
uniquely represent portions of the human genome. Allelic variation occurs to some degree in the 
coding regions of these sequences, and to a greater degree in the non-coding regions. It is 
5 estimated that allelic variation between individual humans occurs with a frequency of about once 
per each 500 bases. Much of the allelic variation is due to single nucleotide polymorphisms 
(SNPs), which include restriction fragment length polymorphisms (RFLPs). 

Each of the sequences described herein can, to some degree, be used as a standard against 
which DNA from an individual can be compared for identification purposes. Because greater 

10 numbers of polymorphisms occur in the non-coding regions, fewer sequences are necessary lo 

differentiate individuals. The non-coding sequences can comfortably provide positive individual 
identification with a panel of perhaps 10 to 1,000 primers that each yield a non-coding amplified 
sequence of 100 bases. If predicted coding sequences, such as those in SEQ ID NO:l, 3, 5, 7, 9, 
11,13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 are used, a more appropriate number of 

15 primers for positive individual identification would be 500-2,000. 

Predictive Medicine 

The invention also pertains to the field of predictive medicine in which diagnostic assays, 
prognostic assays, pharmacogenomics, and monitoring clinical trials are used for prognostic 
(predictive) purposes to thereby treat an individual prophylactically. Accordingly, one aspect of 

20 the invention relates to diagnostic assays for determining SECP protein and/or nucleic acid 
expression as well as SECP activity, in the context of a biological sample (e.g., blood, serum, 
cells, tissue) to thereby determine whether an individual is afflicted with a disease or disorder, or 
is at risk of developing a disorder, associated with aberrant SECP expression or activity. The 
invention also provides for prognostic (or predictive) assays for determining whether an 

25 individual is at risk of developing a disorder associated with SECP protein, nucleic acid 

expression or activity. For example, mutations in a SECP gene can be assayed in a biological 
sample. Such assays can be used for prognostic or predictive purpose to thereby prophylactically 
treat an individual prior to the onset of a disorder characterized by or associated with SECP 
protein, nucleic acid expression or activity. 

30 Another aspect of the invention provides methods for determining SECP protein, nucleic 

acid expression or SECP activity in an individual to thereby select appropriate therapeutic or 
prophylactic agents for that individual (referred to herein as "pharmacogenomics"). 
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Pharmacogenomics allows for the selection of agents (e.^., drugs) for therapeutic or prophylactic 
treatment of an individual based on the genotype of the individual (e.g., the genotype of the 
individual examined to determine the ability of the individual to respond to a particular agent.) 

Yet another aspect of the invention pertains to monitoring the influence of agents {e.g., 
5 drugs, compounds) on the expression or activity of SECP in clinical trials. 

Use of Partial SECP Sequences in Forensic Biology 
DNA-based identification techniques can also be used in forensic biology. Forensic 
biology is a scientific field employing genetic typing of biological evidence found at a crime 
scene as a means for positively identifying, e.g., a perpetrator of a crime. To make such an 
10 identification, PCR technology can be used to amplify DNA sequences taken from very small 
biological samples such as tissues (e.g.^ hair or skin, or body fluids, e.g., blood, saliva, or semen 
found at a crime scene). The amplified sequence can then be compared to a standard, thereby 
allowing identification of the origin of the biological sample. 

The sequences of the invention can be used to provide polynucleotide reagents, e.g., PCR 
15 primers, targeted to specific loci in the human genome, that can enhance the reliability of 

DNA-based forensic identifications by, for example, providing another "identification marker" 
(i.e. another DNA sequence that is unique to a particular individual). As mentioned above, 
actual base sequence information can be used for identification as an accurate alternative to 
patterns formed by restriction enzyme generated fragments. Sequences targeted to non-coding 
20 regions of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 are 
particularly appropriate for this use as greater numbers of polymorphisms occur in the non- 
coding regions, making it easier to differentiate individuals using this technique. Examples of 
polynucleotide reagents include the SECP sequences or portions thereof, e.g., fragments derived 
from the non-coding regions of one or more of SEQ ID NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 
25 44, 46, 48, 50, 52, 54 and 56 having a length of at least 20 bases, preferably at least 30 bases. 

The SECP sequences described herein can further be used to provide polynucleotide 
reagents, e.g., labeled or label-able probes that can be used, for example, in an in situ 
hybridization technique, to identify a specific tissue (e.g., brain tissue, etc). This can be very 
useful in cases where a forensic pathologist is presented with a tissue of unknown origin. Panels 
30 of such SECP probes can be used to identify tissue by species and/or by organ type. 
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In a similar fashion, these reagents, e.g., SECP primers or probes can be used to screen 
tissue culture for contamination screen for the presence of a mixture of different types of 
cells in a culture). 

Predictive Medicine 

5 The invention also pertains to the field of predictive medicine in which diagnostic assays, 

prognostic assays, pharmacogenomics, and monitoring clinical trials are used for prognostic 
(predictive) purposes to thereby treat an individual prophylactically. Accordingly, one aspect of 
the invention relates to diagnostic assays for determining SECP protein and/or nucleic acid 
expression as well as SECP activity, in the context of a biological sample (e.g., blood, serum^ 

10 cells, tissue) to thereby determine whether an individual is afflicted with a disease or disorder, or 
is at risk of developing a disorder, associated with aberrant SECP expression or activity. The 
invention also provides for prognostic (or predictive) assays for determining whether an 
individual is at risk of developing a disorder associated with SECP protein, nucleic acid 
expression or activity. For example, mutations in a SECP gene can be assayed in a biological 

15 sample. Such assays can be used for prognostic or predictive purpose to thereby prophylactically 
treat an individual prior to the onset of a disorder characterized by or associated with SECP - . 
protein, nucleic acid expression, or biological activity. 

Another aspect of the invention provides methods for determining SECP protein, nucleic 
acid expression or activity in an individual to thereby select appropriate therapeutic or 
20 prophylactic agents for that individual (referred to herein as "pharmacogenomics")- 

Pharmacogenomics allows for the selection of agents (e.g., drugs) for therapeutic or prophylactic 
treatment of an individual based on the genotype of the individual {e.g., the genotype of the 
individual examined to determine the ability of the individual to respond to a particular agent.) 

Yet another aspect of the invention pertains to monitoring the influence of agents {e.g., 
25 drugs, compounds) on the expression or activity of SECP in clinical trials. 

These and various other agents are described in further detail in the following sections. • 
Diagnostic Assays 

An exemplary method for detecting the presence or absence of SECP in a biological 
sample involves obtaining a biological sample from a test subject and contacting the biological 
30 sample with a compound or an agent capable of detecting SECP protein or nucleic acid {e.g., 

mRNA, genomic DNA) that encodes SECP protein such that the presence of SECP is detected in 
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the biological sample. An agent for detecting SECP mRNA or genomic DNA is a labeled 
nucleic acid probe capable of hybridizing to SECP mRNA or genomic DNA. The nucleic acid 
probe can be, for example, a full-length SECP nucleic acid, such as the nucleic acid of SEQ ID 
NO:l, 3, 5, 7, 9, 11, 13, 15, 17, 40, 42, 44, 46, 48, 50, 52, 54 and 56 or a portion thereof, such as 
5 an oligonucleotide of at least 15, 30, 50, 100, 250 or 500 nucleotides in length and sufficient to 
specifically hybridize under stringent conditions to SECP mRNA or genomic DNA. Other 
suitable probes for use in the diagnostic assays of the invention are described herein. 

An agent for detecting SECP protein is an antibody capable of binding to SECP protein, 
preferably an antibody with a detectable label. Antibodies can be polyclonal, or more preferably, 

10 monoclonal. An iniaci aniibody, or a fragmeni thereof (e.g.. Fab or F(ab)2) can be used. The lerm 
"labeled", with regard to the probe or antibody, is intended to encompass direct labeling of the 
probe or antibody by coupling (i.e., physically linking) a detectable substance to the probe or 
antibody, as well as indirect labeling of the probe or antibody by reactivity with another reagent 
that is directly labeled. Examples of indirect labeling include detection of a primary antibody 

15 using a fluorescently-labeled secondary antibody and end-labeling of a DNA probe with biotin 
such that it can be detected with fluorescently-labeled streptavidin. The term "biological 
sample" is intended to include tissues, cells and biological fluids isolated from a subject, as well 
as tissues, cells and fluids present within a subject. That is, the detection method of the invention 
can be used to detect SECP mRNA, protein, or genomic DNA in a biological sample in vitro as 

20 well as in vivo. For example, in vitro techniques for detection of SECP mRNA include Northern 
hybridizations and in situ hybridizations. In vitro techniques for detection of SECP protein 
include enzyme linked immunosorbent assays (ELISAs), Western blots, immunoprecipitations, 
and immunofluorescence. In vitro techniques for detection of SECP genomic DNA include 
Southern hybridizations. Furthermore, in vivo techniques for detection of SECP protein include 

25 introducing into a subject a labeled anti-SECP antibody. For example, the antibody can be 

labeled with a radioactive marker whose presence and location in a subject can be detected by 
standard imaging techniques. 

In one embodiment, the biological sample contains protein molecules from the test 
subject. Alternatively, the biological sample can contain mRNA molecules from the test subject 
30 or genomic DNA molecules from the test subject. A preferred biological sample is a peripheral 
blood leukocyte sample isolated by conventional means from a subject. 

In another embodiment, the methods further involve obtaining a control biological 

sample from a control subject, contacting the control sample with a compound or agent capable 
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of detecting SECP protein, mRNA, or genomic DNA, such that the presence of SECP protein, 
mRNA or genomic DNA is detected in the biological sample, and comparing the presence of 
SECP protein, mRNA or genomic DNA in the control sample with the presence of SECP 
protein, mRNA or genomic DNA in the test sample. 

5 The invention also encompasses kits for detecting the presence of SECP in a biological 

sample. For example, the kit can comprise: a labeled compound or agent capable of detecting 
SECP protein or mRNA in a biological sample; means for determining the amount of SECP in 
the sample; and means for comparing the amount of SECP in the sample with a standard. The 
compound or agent can be packaged in a suitable container. The kit can further comprise 
iO instructions for using the kit to detect SECP protein or nucleic acid. 

Prognostic Assctys 

The diagnostic methods described herein can furthermore be utilized to identify subjects 
having or at risk of developing a disease or disorder associated with aberrant SECP expression or • 
activity. For example, the assays described herein, such as the preceding diagnostic assays or the 

15 following assays, can be utilized to identify a subject having or at risk of developing a disorder 
associated with SECP protein, nucleic acid expression or activity. Alternatively, the prognostic 
assays can be utilized to identify a subject having or at risk for developing a disease or disorder: 
Thus, the invention provides a method for identifying a disease or disorder associated with 
aberrant SECP expression or activity in which a test sample is obtained from a subject and SECP 

20 protein or nucleic acid (e.g., mRNA, genomic DNA) is detected, wherein the presence of SECP 
protein or nucleic acid is diagnostic for a subject having or at risk of developing a disease or 
disorder associated with aberrant SECP expression or activity. As used herein, a "test sample" 
refers to a biological sample obtained from a subject of interest. For example, a test sample can 
be a biological fluid (e.g., serum), cell sample, or tissue. 

25 Furthermore, the prognostic assays described herein can be used to determine whether a 

subject can be administered an agent (e.g., an agonist, antagonist, peptidomimetic, protein, 
peptide, nucleic acid, small molecule, or other drug candidate) to treat a disease or disorder 
associated with aberrant SECP expression or activity. For example, such methods can be used to 
determine whether a subject can be effectively treated with an agent for a disorder. Thus, the 

30 invention provides methods for determining whether a subject can be effectively treated with an 
agent for a disorder associated with aberrant SECP expression or activity in which a test sample 
is obtained and SECP protein or nucleic acid is detected (e.g., wherein the presence of SECP 
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protein or nucleic acid is diagnostic for a subject that can be administered the agent to treat a 
disorder associated with aberrant SECP expression or activity). 

The methods of the invention can also be used to detect genetic lesions in a SECP gene, 
thereby determining if a subject with the lesioned gene is at risk for a disorder characterized by 
5 aberrant cell proliferation and/or differentiation. In various embodiments, the methods include 
detecting, in a sample of cells from the subject, the presence or absence of a genetic lesion 
characterized by at least one of an alteration affecting the integrity of a gene encoding a 
SECP-protein, or the mis-expression of the SECP gene. For example, such genetic lesions can 
be detected by ascertaining the existence of at least one of: (i) a deletion of one or more 

10 nucleotides from a SECP gene; (it) an addition of one or more nucleotides to a SECP gene; 

(///) a substitution of one or more nucleotides of a SECP gene, (/v) a chromosomal rearrangement 
of a SECP gene; (v) an alteration in the level of a messenger RNA transcript of a SECP gene, 
(vi) aberrant modification of a SECP gene, such as of the methylation pattern of the genomic 
DNA, (vii) the presence of a non-wild-type splicing pattern of a messenger RNA transcript of a 

15 SECP gene, (viii) a non-wild-type level of a SECP protein, (ix) allelic loss of a SECP gene, and 
(x) inappropriate post-translational modification of a SECP protein. As described herein, there 
are a large number of assay techniques known in the art which can be used for detecting lesions 
in a SECP gene. A preferred biological sample is a peripheral blood leukocyte sample isolated 
by conventional means from a subject. However, any biological sample containing nucleated 

20 cells may be used, including, for example, buccal mucosal cells. 

In certain embodiments, detection of the lesion involves the use of a probe/primer in a 
polymerase chain reaction (PCR) (see, e.g., U.S. Patent Nos. 4,683,195 and 4,683,202), such as 
anchor PCR or RACE PCR, or, alternatively, in a ligation chain reaction (LCR) (see, e.g.y 
Landegran, et aL, 1988. Science 241: 1077-1080; and Nakazawa, et al, 1994. Proc, Natl Acad. 

25 Sci. USA 91: 360-364), the latter of which can be particularly useful for detecting point 

mutations in the SECP-gene (see, Abravaya, et al., 1995. Nucl Acids Res. 23: 675-682). This 
method can include the steps of collecting a sample of cells from a patient, isolating nucleic acid 
(e.g.^ genomic, mRNA or both) from the cells of the sample, contacting the nucleic acid sample 
with one or more primers that specifically hybridize to a SECP gene under conditions such that 

30 hybridization and amplification of the SECP gene (if present) occurs, and detecting the presence 
or absence of an amplification product, or detecting the size of the amplification product and 
comparing the length to a control sample. It is anticipated that PCR and/or LCR may be 
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desirable to use as a preliminary amplification step in conjunction with any of the techniques 
used for detecting mutations described herein. 

Alternative amplification methods include: self sustained sequence replication (see, 
Guatelli, et al, 1990. Proc. Natl, Acad, Sci. USA 87: 1874-1878), transcriptional amplification 
5 system (see, Kwoh, et al., 1989. Proc, Natl, Acad. ScL USA 86: 1 173-1 177); QP Replicase (see, 
Lizardi, et al, 1988. BioTechnology 6: 1 197), or any other nucleic acid amplification method, 
followed by the detection of the amplified molecules using techniques well known to those of 
skill in the art. These detection schemes are especially useful for the detection of nucleic acid 
molecules if such molecules are present in very low numbers. 

10 In an alternative embodiment, mutations in a SECP gene from a sample cell can be 

identified by alterations in restriction enzyme cleavage patterns. For example, sample and 
control DNA is isolated, amplified (optionally), digested with one or more restriction 
endonucleases, and fragment length sizes are determined by gel electrophoresis and compared. 
Differences in fragment length sizes between sample and control DNA indicates mutations in the 

15 sample DNA. Moreover, the use of sequence specific ribozymes (see, e,g,, U.S. Patent No. 

5,493,531) can be used to score for the presence of specific mutations by development or loss of 
a ribozyme cleavage site. 

In other embodiments, genetic mutations in SECP can be identified by hybridizing a 
sample and control nucleic acids, e.^,, DNA or RNA, to high-density arrays containing, hundreds 

20 or thousands of oligonucleotides probes. See, e.g., Cronin, et al., 1996. Human Mutation 7: 

244-255; Kozal, et al, 1996. Nat. Med. 2: 753-759. For example, genetic mutations in SECP can 
be identified in two dimensional arrays containing light-generated DNA probes as described in 
Cronin, et al., supra. Briefly, a first hybridization array of probes can be used to scan through 
long stretches of DNA in a sample and control to identify base changes between the sequences 

25 by making linear arrays of sequential overlapping probes. This step allows the identification of 
point mutations. This is followed by a second hybridization array that allows the 
characterization of specific mutations by using smaller, specialized probe arrays complementary 
to all variants or mutations detected. Each mutation array is composed of parallel probe sets, one 
complementary to the wild-type gene and the other complementary to the mutant gene. 

30 In yet another embodiment, any of a variety of sequencing reactions known in the art can 

be used to directly sequence the SECP gene and detect mutations by comparing the sequence of 
the sample SECP with the corresponding wild-type (control) sequence. Examples of sequencing 
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reactions include those based on techniques developed by Maxim and Gilbert, 1977. Proc. NatL 
Acad. ScL USA 74: 560 or Sanger, 1977. Proc. NatL Acad. ScL USA 74: 5463. It is also 
contemplated that any of a variety of automated sequencing procedures can be utilized when 
performing the diagnostic assays (see, e.g., Naeve, et al., 1995. Biotechniques 19: 448), 
5 including sequencing by mass spectrometry (see, e.g., PCT International Publication No. WO 
94/16101; Cohen, et a/., 1996. Adv. Chromatography 36: 127-162; and Griffin, et al., 1993. 
Appl. Biochem. Biotechnol. 38: 147-159). 

Other methods for detecting mutations in the SECP gene include methods in which 
protection from cleavage agents is used to detect mismatched bases in RNA/RNA or RNA/DNA 

10 heieroduplexes. See^ e.g., Myers, et aL, 1985. Science 230: 1242. In general, the art technique 
of "mismatch cleavage" starts by providing heteroduplexes of formed by hybridizing (labeled) 
RNA or DNA containing the wild-type SECP sequence with potentially mutant RNA or DNA 
obtained from a tissue sample. The double-stranded duplexes are treated with an agent that 
cleaves single-stranded regions of the duplex such as which will exist due to basepair 

15 mismatches between the control and sample strands. For instance, RNA/DNA duplexes can be 
treated with RNase and DNA/DNA hybrids treated with Si nuclease to enzymatically digesting 
the mismatched regions. In other embodiments, either DNA/DNA or RNA/DNA duplexes can 
be treated with hydroxylamine or osmium tetroxide and with piperidine in order to digest 
mismatched regions. After digestion of the mismatched regions, the resulting material is then 

20 separated by size on denaturing polyacrylamide gels to determine the site of mutation. See, e.g.. 
Cotton, et al, 1988. Proc. NatL Acad. ScL USA 85: 4397; Saleeba, et aL, 1992. Methods 
EnzymoL 217: 286-295. In an embodiment, the control DNA or RNA can be labeled for 
detection. 

In still another embodiment, the mismatch cleavage reaction employs one or more 
25 proteins that recognize mismatched base pairs in double-stranded DNA (so called "DNA 

mismatch repair" enzymes) in defined systems for detecting and mapping point mutations in 
SECP cDNAs obtained from samples of cells. For example, the mutY enzyme of E. coli cleaves 
A at G/A mismatches and the thymidine DNA glycosylase from HeLa cells cleaves T at G/T 
mismatches- See, e.g., Hsu, ^ra/., 1994. Carcinogenesis 15: 1657-1662. According to an 
30 exemplary embodiment, a probe based on a SECP sequence, e.g., a wild-type SECP sequence, is 
hybridized to a cDNA or other DNA product from a test cell(s). The duplex is treated with a 
DNA mismatch repair enzyme, and the cleavage products, if any, can be detected from 
electrophoresis protocols or the like. See, e.g., U.S. Patent No. 5,459,039. 
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In other embodiments, alterations in electrophoretic mobility will be used to identify 
mutations in SECP genes. For example, single strand conformation polymorphism (SSCP) may 
be used to detect differences in electrophoretic mobility between mutant and wild type nucleic 
acids. See, e,g., Orita, etaly 1989. Proc. Natl Acad. ScL USA: 86: 2766; Cotton, 1993. Mutat 
5 Res. 285: 125-144; Hayashi, 1992. Genet. Anal. Tech. Appl. 9: 73-79. Single-stranded DNA 
fragments of sample and control SECP nucleic acids will be denatured and allowed to renature. 
The secondary structure of single-stranded nucleic acids varies according to sequence, the 
resulting alteration in electrophoretic mobility enables the detection of even a single base change. 
The DNA fragments may be labeled or detected with labeled probes. The sensitivity of the assay 
10 may be enhanced by using RNA (rather than DNA), in which the secondary structure is more 
sensitive to a change in sequence. In one embodiment, the subject method utilizes heteroduplex 
analysis to separate double stranded heteroduplex molecules on the basis of changes in 
electrophoretic mobility. See, e.g.. Keen, etal.^ 1991. Trends Genet. 1: 5. 

In yet another embodiment, the movement of mutant or wild-type fragments in 
15 poly aery lamide gels containing a gradient of denaturant is assayed using denaturing gradient gel 
, electrophoresis (DGGE). See, e.g., Myers, et al, 1985. Nature 313: 495. When DGGE is used 
as the method of analysis, DNA will be modified to insure that it does not completely denature, 
for example by adding a GC clamp of approximately 40 bp of high-melting GC-rich DNA by 
PGR. In a further embodiment, a temperature gradient is used in place of a denaturing gradient 
20 to identify differences in the mobility of control and sample DNA. See, e.g., Rosenbaum and 
Reissner, 1987. Biophys. Chem. 265: 12753. 

Examples of other techniques for detecting point mutations include, but are not limited 
to, selective oligonucleotide hybridization, selective amplification, or selective primer extension. 
For example, oligonucleotide primers may be prepared in which the known mutation is placed 
25 centrally and then hybridized to target DNA under conditions that permit hybridization only if a 
perfect match is found. See, e.g., Saiki, et al, 1986. Nature 324: 163; Saiki, et al, 1989. Proc. 
Natl. Acad. ScL USA 86: 6230. Such allele specific oligonucleotides are hybridized to PCR 
amplified target DNA or a number of different mutations when the oligonucleotides are attached 
to the hybridizing membrane and hybridized with labeled target DNA. 

30 Alternatively, allele specific amplification technology that depends on selective PCR 

amplification may be used in conjunction with the instant invention. Oligonucleotides used as 

primers for specific amplification may carry the mutation of interest in the center of the molecule 

(so that amplification depends on differential hybridization; see, e.g., Gibbs, et al.y 1989. Nucl. 
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Acids Res, 17: 2437-2448) or at the extreme 3'-terminus of one primer where, under appropriate 
conditions, mismatch can prevent, or reduce polymerase extension (see, e.g., Prossner, 1993. 
Tibtech, 11: 238). In addition it may be desirable to introduce a novel restriction site in the 
region of the mutation to create cleavage-based detection. See, e.g., Gasparini, et aL, 1992. Mol. 
Cell Probes 6: 1. It is anticipated that in certain embodiments amplification may also be 
performed using Taq ligase for amplification. See, e.g., Barany, 1991. Proc. Natl Acad. Sci. 
USA 88: 189. In such cases, ligation will occur only if there is a perfect match at the 3'-terminus 
of the 5' sequence, making it possible to detect the presence of a known mutation at a specific 
site by looking for the presence or absence of ampHfication. 

The methods described herein may be performed, for example, by utilizing pre-packaged 
diagnostic kits comprising at least one probe nucleic acid or antibody reagent described herein, 
which may be conveniently used, e.g., in clinical settings to diagnose patients exhibiting 
symptoms or family history of a disease or illness involving a SECP gene. 

Furthermore, any cell type or tissue, preferably peripheral blood leukocytes, in which 
SECP is expressed may be utilized in the prognostic assays described herein. However, any 
biological sample containing nucleated cells may be used, including, for example, buccal 
mucosal cells. 

Pharmacogenomics 

Agents, or modulators that have a stimulatory or inhibitory effect on SECP activity (e.g,^ 
SECP gene expression), as identified by a screening assay described herein can be administered 
to individuals to treat (prophylactically or therapeutically) disorders {e.g.^ cancer or immune 
disorders associated with aberrant SECP activity. In conjunction with such treatment, the 
pharmacogenomics (/.e., the study of the relationship between an individual's genotype and that 
individual's response to a foreign compound or drug) of the individual may be considered. 
Differences in metabolism of therapeutics can lead to severe toxicity or therapeutic failure by 
altering the relation between dose and blood concentration of the pharmacologically active drug. 
Thus, the pharmacogenomics of the individual permits the selection of effective agents {e.g., 
drugs) for prophylactic or therapeutic treatments based on a consideration of the individual's 
genotype. Such pharmacogenomics can further be used to determine appropriate dosages and 
therapeutic regimens. Accordingly, the activity of SECP protein, expression of SECP nucleic 
acid, or mutation content of SECP genes in an individual can be determined to thereby select 
appropriate agent(s) for therapeutic or prophylactic treatment of the individual. 
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Pharmacogenomics deals with clinically significant hereditary variations in the response 
to drugs due to altered drug disposition and abnormal action in affected persons. See e.g., 
Eichelbaum, 1996. Clin, Exp, Pharmacol. Physiol. 23: 983-985; Linder, 1997. Clin, Chem., 43: 
254-266. In general, two types of pharmacogenetic conditions can be differentiated. Genetic 
5 conditions transmitted as a single factor altering the way drugs act on the body (altered drug 

action) or genetic conditions transmitted as single factors altering the way the body acts on drugs 
(altered drug metabolism). These pharmacogenetic conditions can occur either as rare defects or 
as polymorphisms. For example, glucose-6-phosphate dehydrogenase (G6PD) deficiency is a 
common inherited enzymopathy in which the main clinical complication is hemolysis after 
10 ingestion of oxidant drugs (anti-malarials, sulfonamides, analgesics, nitrofurans) and 
consumption of fava beans. 

As an illustrative embodiment, the activity of drug metabolizing enzymes is a major 
determinant of both the intensity and duration of drug action. The discovery of genetic 
polymorphisms of drug metabolizing enzymes (e.g,y N-acetyltransferase 2 (NAT 2) and 

15 cytochrome P450 enzymes CYP2D6 and CYP2C19) has provided an explanation as to why 
some patients do not obtain the expected drug effects or show exaggerated drug response and 
serious toxicity after taking the standard and safe dose of a drug. These polymorphisms are 
expressed in two phenotypes in the population, the extensive metabolizer (EM) and poor 
metabolizer (PM). The prevalence of PM is different among different populations. For example, 

20 the gene coding for CYP2D6 is highly polymorphic and several mutations have been identified 
in PM, which all lead to the absence of functional CYP2D6. Poor metabolizers of CYP2D6 and 
CYP2C19 quite frequently experience exaggerated drug response and side effects when they 
receive standard doses. If a metabolite is the active therapeutic moiety, PM show no therapeutic 
response, as demonstrated for the analgesic effect of codeine mediated by its CYP2D6-formed 

25 metabolite morphine. At the other extreme are the so called ultra-rapid metabolizers who do not 
respond to standard doses. Recently, the molecular basis of ultra-rapid metabolism has been 
identified to be due to CYP2D6 gene amplification. 

Thus, the activity of SECP protein, expression of SECP nucleic acid, or mutation content 
of SECP genes in an individual can be determined to thereby select appropriate agent(s) for 
30 therapeutic or prophylactic treatment of the individual. In addition, pharmacogenetic studies can 
be used to apply genotyping of polymorphic alleles encoding drug-metabolizing enzymes to the 
identification of an individual's drug responsiveness phenotype. This knowledge, when applied 
to dosing or drug selection, can avoid adverse reactions or therapeutic failure and thus enhance 
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therapeutic or prophylactic efficiency when treating a subject with a SECP modulator, such as a 
modulator identified by one of the exemplary screening assays described herein. 

Monitoring of Effects During Clinical Trials 

Monitoring the influence of agents (e.g., drugs, compounds) on the expression or activity 
5 of SECP (e.g., the ability to modulate aberrant cell proliferation and/or differentiation) can be 
applied not only in basic drug screening, but also in clinical trials. For example, the 
effectiveness of an agent determined by a screening assay as described herein to increase SECP 
gene expression, protein levels, or upregulate SECP activity, can be monitored in clinical trails 
of subjects exhibiting decreased SECP gene expression, protein levels, or down-regulated SECP 

10 activity. Alternatively, the effectiveness of an agent determined by a screening assay to decrease 
SECP gene expression, protein levels, or down-regulate SECP activity, can be monitored in 
clinical trails of subjects exhibiting increased SECP gene expression, protein levels, or up- 
regulated SECP activity. In such clinical trials, the expression or activity of SECP and, 
preferably, other genes that have been implicated in, for example, a cellular proliferation or 

15 immune disorder can be used as a "read out" or markers of the immune responsiveness of a 
particular cell. 

By ,way of example, and not of Hmitation, genes, including SECP, that are modulated in 
cells by treatment with an agent (e.g., compound, drug or small molecule) that modulates SECP 
activity (e.g., identified in a screening assay as described herein) can be identified. Thus, to 

20 study the effect of agents on cellular proliferation disorders, for example, in a clinical trial, cells 
can be isolated and RNA prepared and analyzed for the levels of expression of SECP and other 
genes implicated in the disorder. The levels of gene expression (i.e., a gene expression pattern) 
can be quantified by Northern blot analysis or RT-PCR, as described herein, or alternatively by 
measuring the amount of protein produced, by one of the methods as described herein, or by 

25 measuring the levels of activity of SECP or other genes. In this manner, the gene expression 
pattern can serve as a marker, indicative of the physiological response of the cells to the agent. 
Accordingly, this response state may be determined before, and at various points during, 
treatment of the individual with the agent. 

In one embodiment, the invention provides a method for monitoring the effectiveness of 
30 treatment of a subject with an agent (e.g., an agonist, antagonist, protein, peptide, 

peptidomimetic, nucleic acid, small molecule, or other drug candidate identified by the screening 
assays described herein) comprising the steps of (/) obtaining a pre-administration sample from a 
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subject prior to administration of the agent; (i/) detecting the level of expression of a SECP 
protein, mRNA, or genomic DNA in the pre-administration sample; (Hi) obtaining one or more 
post-administration samples from the subject; (iv) detecting the level of expression or activity of 
the SECP protein, mRNA, or genomic DNA in the post-administration samples; (v) comparing 
5 the level of expression or activity of the SECP protein, mRNA, or genomic DNA in the 
pre-administration sample with the SECP protein, mRNA, or genomic DNA in the post 
administration sample or samples; and (v/) altering the administration of the agent to the subject 
accordingly. For example, increased administration of the agent may be desirable to increase the 
expression or activity of SECP to higher levels than detected, i.e., to increase the effectiveness of 
10 the agent. Alternatively, decreased administration of the agent may be desirable to decrease 

expression or activity. of SECP to lower levels than detected, Le., to decrease the effectiveness of 
the agent. 

Methods of Treatment 

The invention provides for both prophylactic and therapeutic methods of treating a 
15 subject at risk of (or susceptible to) a disorder or having a disorder associated with aberrant 
SECP expression or activity. These methods of treatment will be discussed more fully, below. 

Disease and Disorders 

Diseases and disorders that are characterized by increased (relative to a subject not 
suffering from the disease or disorder) levels or biological activity may be treated with 

20 Therapeutics that antagonize (i.e., reduce or inhibit) activity. Therapeutics that antagonize 

activity may be administered in a therapeutic or prophylactic manner. Therapeutics that may be 
utilized include, but are not limited to: (0 an aforementioned peptide, or analogs, derivatives, 
fragments or homologs thereof; (//) antibodies to an aforementioned peptide; (Hi) nucleic acids 
encoding an aforementioned peptide; (iv) administration of antisense nucleic acid and nucleic 

25 acids that are "dysfunctional" (i.e., due to a heterologous insertion within the coding sequences 
of coding sequences to an aforementioned peptide) that are utilized to "knockout" endoggenous 
function of an aforementioned peptide by homologous recombination (see, e.g., Capecchi, 1989. 
Science 244: 1288-1292); or (v) modulators ( i.e., inhibitors, agonists and antagonists, including 
additional peptide mimetic of the invention or antibodies specific to a peptide of the invention) 

30 that alter the interaction between an aforementioned peptide and its binding partner. 

Diseases and disorders that are characterized by decreased (relative to a subject not 
suffering from the disease or disorder) levels or biological activity may be treated with 
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Therapeutics that increase (/.^.., are agonists to) activity. Therapeutics that upregulate activity 
may be administered in a therapeutic or prophylactic manner. Therapeutics that may be utilized 
include, but are not limited to, an aforementioned peptide, or analogs, derivatives, fragments or 
homologs thereof; or an agonist that increases bioavailability. 

5 Increased or decreased levels can be readily detected by quantifying peptide and/or RNA, 

by obtaining a patient tissue sample (e.g., from biopsy tissue) and assaying it in vitro for RNA or 
peptide levels, structure and/or activity of the expressed peptides (or mRNAs of an 
aforementioned peptide). Methods that are well-known within the art include, but are not limited 
to, immunoassays (e,g., by Western blot analysis, immunoprecipitation followed by sodium 
iO dodecyi sulfate (SDS) poiyacryiamide gel electrophoresis, immunocytochemistry, etc.) and/or 
hybridization assays to detect expression of mRNAs (e.g.. Northern assays, dot blots, in situ 
hybridization, and the like). 

Prophylactic Methods 

In one aspect, the invention provides a method for preventing, in a subject, a disease or 
15 condition associated with an aberrant SECP expression or activity, by administering to the 

subject an agent that modulates SEGP expression or at least one SECP activity. Subjects at risk - 
for a disease that is caused or contributed to by aberrant SECP expression or activity can be 
identified by, for example, any or a combination of diagnostic or prognostic assays as described 
herein. Administration of a prophylactic agent can occur prior to the manifestation of symptoms 
2ft characteristic of the SECP aberrancy, such that a disease or disorder is prevented or, 

alternatively, delayed in its progression. Depending upon the type of SECP aberrancy, for 
example, a SECP agonist or SECP antagonist agent can be used for treating the subject. The 
appropriate agent can be determined based on screening assays described herein. 

Therapeutic Methods 

25 Another aspect of the invention pertains to methods of modulating SECP expression or 

activity for therapeutic purposes. The modulatory method of the invention involves contacting a 
cell with an agent that modulates one or more of the activities of SECP protein activity 
associated with the cell. An agent that modulates SECP protein activity can be an agent as 
described herein, such as a nucleic acid or a protein, a naturally-occurring cognate ligand of a 

30 SECP protein, a peptide, a SECP peptidomimetic, or other small molecule. In one embodiment, 
the agent stimulates one or more SECP protein activity. Examples of such stimulatory agents 
include active SECP protein and a nucleic acid molecule encoding SECP that has been 
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introduced into the cell. In another embodiment, the agent inhibits one or more SECP protein 
activity. Examples of such inhibitory agents include antisense SECP nucleic acid molecules and 
anti-SECP antibodies. These modulatory methods can be performed in vitro (e.g., by culturing 
the cell with the agent) or, alternatively, in vivo (e,g.y by administering the agent to a subject). 
5 As such, the invention provides methods of treating an individual afflicted v^ith a disease or 
disorder characterized by aberrant expression or activity of a SECP protein or nucleic acid 
molecule. In one embodiment, the method involves administering an agent (e.g., an agent 
identified by a screening assay described herein), or combination of agents that modulates (e.g., 
up-regulates or down-regulates) SECP expression or activity. In another embodiment, the 
10 method involves administering a SECP protein or nucleic acid molecule as therapy to 
compensate for reduced or aberrant SECP expression or activity. 

Stimulation of SECP activity is desirable in situations in which SECP is abnormally 
down-regulated and/or in which increased SECP activity is likely to have a beneficial effect. 
One example of such a situation is where a subject has a disorder characterized by aberrant cell 
15 proliferation and/or differentiation (e.g., cancer or immune associated disorders). Another 
example of such a situation is where the subject has a gestational disease (e.g., pre-clampsia). 

Determination of the Biological Effect of the Therapeutic 

In various embodiments of the invention, suitable in vitro or in vivo assays are performed 
to determine the effect of a specific Therapeutic and whether its administration is indicated for 
20 treatment of the affected tissue. 

In various specific embodiments, in vitro assays may be performed with representative 
cells of the type(s) involved in the patient's disorder, to determine if a given Therapeutic exerts 
the desired effect upon the cell type(s). Compounds for use in therapy may be tested in suitable 
animal model systems including, but not limited to rats, mice, chicken, cows, monkeys, rabbits, 
25 and the like, prior to testing in human subjects. Similarly, for in vivo testing, any of the animal 
model system known in the art may be used prior to administration to human subjects. 

Prophylactic and Therapeutic Uses of the Compositions of the Invention 

The SECP nucleic acids and proteins of the invention may be useful in a variety of 
potential prophylactic and therapeutic applications. By way of a non-limiting example, a cDNA 
30 encoding the SECP protein of the invention may be useful in gene therapy, and the protein may 
be useful when administered to a subject in need thereof. 
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Both the novel nucleic acids encoding the SECP proteins, and the SECP proteins of the 
invention, or fragments thereof, may also be useful in diagnostic applications, wherein the 
presence or amount of the nucleic acid or the protein are to be assessed. These materials are 
further useful in the generation of antibodies which immunospecifically-bind to the novel 
5 substances of the invention for use in therapeutic or diagnostic methods. 

The invention will be further illustrated in the following non-limiting examples. 



Example 1: Radiation Hybrid Mapping Provides the Chromosomal 
Location of SECP 2 (Clone 11618130.0.27) 

10 Radiation hybrid mapping using human chromosome markers was carried out to 

determine the chromosomal location of a SECP2 nuclei acid of the invention. The procedure 
used to obtain these results is described generally in Steen, et aL, 1999. A High-Density 
Integrated Genetic Linkage and Radiation Hybrid Map of the Laboratory Rat, Genome Res, 9: 
AP1-AP8 (Published Online on May 21, 1999). A panel of 93 cell clones containing randomized 

15 radiation-induced human chromosomal fragments was then screened in 96 well plates using PCR 
primers designed to identify the sought clones in a unique fashion. Clone 11618130.0.27, a 
SECP2 nucleic acid was located on chromosome 16 at a map distance of 26.0 cR from marker 
WI-3768 and -70.5 cR from marker TIGR-A002K05. 

Example 2: Molecular Cloning of Clone 1 1618130 

20 Oligonucleotide PCR primers were designed to amplify a DNA segment coding for the 

full length open reading frame of clone 1 1618130. The forward primer included a Bgl II 
restriction site and the consensus Kozak sequence CCACC. The reverse primer contained an 
in-frame Xhol restriction site. Both primers contained a CTCGTC S'-terminus clamp. The 
nucleotide sequences of the primers were: 

25 11618130 Forward Primer: 

CTCGTCAGATCTCCACCATGAGTGATGAGGACAGCTGTGTAG (SEQ ID NO: 19) 

11618130 Reverse Primer: 

CTCGTCCTCGAGGCAGCTGGTTGGTTGGCTTATGTTG (SEQ ID NO: 20) 

The PCR reactions included: 5 ng human fetal brain cDNA template; 1 (xM of each of the 

30 11618130 Forward and 11618130 Reverse primers; 5 \jM dNTP (Clontech Laboratories; Palo 

Alto, CA) and 1 [il of 50x Advantage-HF 2 polymerase (Clontech Laboratories; Palo Alto, CA) 

in 50 fxl total reaction volume. The following PCR conditions were used: 
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a) 96°C 3 minutes 

b) 96**C 30 seconds denaturation 

c) 70**C 30 seconds, primer annealing. This temperature was gradually decreased 
by l^C/cycIe 

5 d) 72**C 1 minute extension. 

Repeat steps b-d a total of 10-times 

e) 96**C 30 seconds denaturation 

f) 60°C 30 seconds annealing 

g) 72°C 1 minute extension 
10 Repeat steps e-g a total of 25-times 

h) 72*^C 5 minutes final extension 

A single, amplified product of approximately 800 bp was detected by agarose gel 
electrophoresis. The PCR amplification product was then isolated by the QIAEX II® Gel 
Extraction System (QIAGEN, Inc; Valencia, CA) in a final volume of 20 (xl. 

15 A total of 10 ^1 of the isolated fragment was digested with Bgl II and Xhol restriction 

enzymes, and ligated into the BamHI- and Xhol-digested mammalian expression vector 
pCDNA3.1 V5His (Invitrogen; Carlsbad, CA.). The construct was sequenced, and the cloned 
insert was verified as a sequence identical to the ORF coding for the full length 1161 8 130. The 
construct was designated pcDNA3. 1-1 1618 130-S 178-2. 

20 Example 3: Expression of 11618130 In Human Embryonic Kidney 293 Cells 

/ The vector pcDNA3.1-l 1618130-S178-2 described in Example 2 was subsequently 

transfected into human embryonic kidney 293 cells (ATCC No. CRL-1573; Manassas, VA) 
using the LipofectaminePlus Reagent following the manufacturer's instructions (Gibco/BRL/Life 
Technologies; Rockville, MD) The cell pellet and supernatant were harvested 72 hours after 

25 transfection, and examined for 1161 8130 expression by use of SDS-PAGE under reducing 

conditions and Western blotting with an anti-V5 antibody. FIG. 12 shows that 1 1618130 was 
expressed as a protein having an apparent molecular weight (Mr) of approximately 34 kilo 
Daltons (kDa) which was intracellularly expressed in the 293 cells. These experimental results 
were consistent with the predicted molecular weight of 28043 Daltons for the protein of clone 

30 1 1618130.0.27 and with the predicted localization of the protein intracellularly in the microbody 
(peroxisome). A second band of approximately 54 kDa was also found, which may represent a 
non-reducible dimer of this protein. 
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Example 4: Preparation of Mammalian Expression Vector pSecVSHis 

The oligonucleotide primers, pSec-V5-His Forward and pSec-V5-His Reverse, were 
generated to amplify a fragment from the pcDNA3.1-V5His (Invitrogen; Carlsbad, CA) 
expression vector that includes V5 and His6. The nucleotide sequences of these primers were: 

5 pSec-V5-His Forward Primer: 

CTCGTCCTCGAGGGTAAGCCTATCCCTAAC (SEQ IDNO:21) 

pSec-V5-His Reverse Primer: 

. . CTCGTCGGGCCCCTGATCAGCGGGTTTAAAC (SEQ ED NO: 22) 

The PCR product was digested with Xhol and Apal, and ligated into the Xhol/Apal- 
10 digested pSecTag2 B vector harboring an Ig kappa leader sequence (Invitrogen; Carlsbad, CA). 
The correct structure of the resulting vector (designated pSecVSHis), including an in-frame 
Ig-kappa leader and V5-His6, was verified by DNA sequence analysis. The pSecVSHis vector . 
included an in-frame Ig kappa leader, a site for insertion of a clone of interest, V5 and His6, 
which allows heterologous protein expression and secretion by fusing any protein to the Ig kappa 
15 chain signal peptide. Detection and purification of the expressed protein was aided by the 

presence of the V5 epitope tag and 6x His tag at the carboxyl-terminus (Invitrogen; Carlsbad, 
CA). 

Example 5: Molecular Cloning of 16406477 

Oligonucleotide PCR primers were designed to amplify a DNA segment encoding for the 
20 mature form of clone 16406477 from amino acid residues 38 to 385, recognition of the signal 
sequence predicted for this polypeptide. The forward primer contained an in-frame BamHI 
restriction site and the reverse primer contained an in-frame Xhol restriction site. Both primers 
,contained the CTCGTC 5' clamp. The sequences of the primers were as follows: 

16406477 Forward Primer: 

25 CTCGTCGGATCCTGGGGCGCAGGGGAAGCCCCGGG (SEQ ID NO:23) 

16406477 Reverse Primer: 

CTCGTCCTCGAGGAGGGCAGCAAGGAGGCTGAGGGGCAG (SEQ ID NO:24) 

The PCR reactions contained: 5 ng human fetal brain cDNA template; 1 ^iM of each of 
the 16406477 Forward and 16406477 Reverse Primers; 5 |liM dNTP (Glontech Laboratories; 
30 Palo Alto, CA) and 1 [i\ of 50x Advantage-HF 2 polymerase (Clontech Laboratories; Palo Alto, 
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CA) in a 50 \i\ total reaction volume. PCR was then conducted using reaction conditions 
identical to those previously described in Example 2. 

A single, amplified product of approximately 1 Kbp was detected by agarose gel 
electrophoresis. The product was then isolated by QIAEX II® Gel Extraction System 
5 (QUIAGEN, Inc; Valencia, CA) in a total reaction volume of 20 

A total of 10 |Lil of the isolated fragment was digested with BamHI and Xhol restriction 
enzymes, and ligated into the pSecV5-His mammalian expression vector (see. Example 4) which 
had been previously-digested with BamHI and Xhol. The construct was sequenced, and the 
cloned insert was verified as possessing a sequence identical to that of the ORE coding for the 
10 mature fragment of clone 16406477. The construct was subsequently designated pSecV5His- 
16406477-S196-A. 

Example 6: Expression of 16406477 in Human Embryonic Kidney 293 Cells 

The pSecV5His-16406477-S196-A construct (see. Example 5) was subsequently 
transfected into 293 cells (ATCC No. CRL-1573; Manassas, VA) using the LipofectaminePIus 

15 Reagent following the manufacturer's instructions (Gibco/BRL/Life Technologies). The cell 
pellet and supernatant were harvested 72 hours after transfection, and examined for 16406477 
expression by use of SDS-PAGE under reducing conditions and Western blotting with an anti- 
V5 antibody. FIG. 13 demonstrates that 16406477 is expressed as a protein having an apparent 
molecular weight (Mr) of approximately 45 kDa which is retained intracellularly in the 293 cells. 

20 The Mr value which was found upon expression of the clone is consistent with the predicted 
molecular weight of 43087 Daltons. 

Example 7: Quantitative Tissue Expression Analysis of Clones of the Invention 

The Quantitative Expression Analysis of several clones of the invention was preformed in 
41 normal and 55 tumor samples (see, FIG. 14) by real-time quantitative PCR (TAQMAN®) by 
25 use of a Perkin-Elmer Biosystems ABI PRISM® 7700 Sequence Detection System. The 
following abbreviations are used in FIG. 14: 

ca. = carcinoma, 
* = established from metastasis, 
met = metastasis, 
30 s cell var= small cell variant, 

non-s = non-sm =non-small, 
squam = squamous, 
pi. eff = pi effusion = pleural effusion. 
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glio = glioma, 

astro = astrocytoma, and 
neuro = neuroblastoma. 

Initially, 96 RNA samples were normalized to P-actin and GAPDH. RNA (-50 ng total 
5 or -1 ng poly(A)+) was converted to cDNA using the TAQMAN® Reverse Transcription 

Reagents Kit (PE Biosystems; Foster City, CA; Catalog No. N808'-0234) and random hexamers 
according to the manufacturer's protocol. Reactions were performed in a 20 jxl total volume, 
and incubated for 30 minutes at 48*^C. cDNA (5 jil) was then transferred to a separate plate for 
the TAQMAN® reaction using P-actin and GAPDH TAQMAN® Assay Reagents (PE 
10 Biosystems; Catalog Nos. 43 10881E and 4310884E, respectively) and TAQMAN® Universal 
PCR Master Mix (PE Biosystems; Catalog No. 4304447) according to the manufacturer's 
protocol. Reactions were performed in a 25 \xl total volume using the following parameters: 
2 minutes at 50*'C; 10 minutes at 95^C; 15 seconds at 95^C/1 min. at 60^C (40 cycles total). 

Results were recorded as CT values (i.e., cycle at which a given sample crosses a 
15 threshold level of fluorescence) using a log scale, with the difference in RNA concentration 

between a given sample and the sample with the lowest CT value being represented as 2^^^. The 
percent relative expression is then obtained by taking the reciprocal of this RNA difference and 
multiplying by 100. The average CT values obtained for P-actin and GAPDH were used to 
normalize RNA samples. The RNA sample generating the highest CT value required no further 
20 diluting, while all other samples were diluted relative to this sample according to their P-actin 
/GAPDH average CT values. 

Normalized RNA (5 ^il) was converted to cDNA and analyzed via TAQMAN® using One 
Step RT-PCR Master Mix Reagents (PE Biosystems; Catalog No. 4309169) and gene-specific 
primers according to the manufacturer's instructions. Probes and primers were designed for each 

25 assay according to Perkin Elmer Biosystem's Primer Express Software package (Version I for 
Apple Computer's Macintosh Power PC) using the sequence of the respective clones as input. 
Default settings were used for reaction conditions and the following parameters were set before 
selecting primers: primer concentration = 250 nM; primer melting temperature (Tm) range = 58**- 
60** C; primer optimal Tm = 59** C; maximum primer difference = 2** C, probe does not posses a 

30 5'-terminus G; probe Tm must be 10** C greater than primer T™; and amplicon size 75 bp to 100 

bp in length. The probes and primers were synthesized by Synthegen (Houston, TX). Probes 

were double-purified by HPLC to remove uncoupled dye and then evaluated by mass 

spectroscopy to verify coupling of reporter and quencher dyes to the 5'- and 3 '-termini of the 
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probe, respectively. Their final concentrations used were - Forward and Reverse Primers = 900 
nM each; and probe = 200nM. 

Subsequent PGR conditions were as follows. Normalized RNA from each tissue and 
each cell line was spotted in each well of a 96 well PGR plate (Perkin Elmer Biosystems). PGR 
5 reaction mixes, including two probes {i.e., SEGP-specific and another gene-specific probe 

multiplexed with the SEPG-specific probe) were set up using Ix TaqMan™ PGR Master Mix for 
the PE Biosystems 7700, with 5 mM MgGl2; dNTPs (dA, G, G, U at 1:1:1:2 ratios); 0,25 U/ml 
AmpliTaq Gold'" (PE Biosystems); 0.4 U/|Lil RNase inhibitor; and 0.25 U/|Lil Reverse 
Transcriptase. Reverse transcription was then performed at 48"G for 30 minutes, followed by 
10 amplification/PGR cycles as follows: 95'*G 10 minuets, then 40 cycles of 95** C for 15 seconds, 
and 60**G for 1 minute. 

The primer-probe sets employed in the expression analysis of each clone, and a summary 
of the results, are provided below. The complete experimental results are illustrated in FIG. 14. 
The panel of cell lines employed was identical in all cases except that samples 95 and 96 were 
15 gDNA and a melanoma UACC-257 (control), respectively, in the experiments for clone 

1 1696905. The nucleotide sequences of the primer sets used for these clones are as follows: 

Clone 11696905.0.47 Primer Set: 

Ag 383 (F): 5; -ggcctctccgtacccttctc-3 • (SEQ ID NO:25) 

Ag 383(R): 5 • -agaggctcttggcgcagtt-3 • (SEQIDNO:26) 

20 Ag 383 (P): tet-5 • -accaggatcacgacctccgcagg-3 • -tamra (SEQ ID NO:27) 

Primer Set Ag 383 was designed to probe for nucleotides 403-478 in SEPG 3 (clone 
11696905.0.47). The results indicate that the clone was prominently expressed in normal cells 
such as adipose, adrenal gland, various regions of the brain, skeletal muscle, bladder, liver and 
fetal liver, mammary gland, placenta, prostate and testis. It was also found to be expressed at 

25 levels much higher than comparable normal cells in cancers of the kidney and lung, and 

expressed at levels much lower than comparable normal cells in cancers of the central nervous 
system (GNS) and breast. These results suggest that SEPG 3 (clone 1 1696905.0.47), or 
fragments thereof, may be useful in probing for cancer in kidney and lung, and that the nucleic 
acid or the protein of clone 1 1696905.0.47 may be a target for therapeutic agents in such cancers. 

30 These nucleic acids and proteins may be useful as therapeutic agents in treating cancers of the 
GNS and breast. 

Clone 16406477.0.206 Primer Set: 
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Ag 53 (F): 5 * -GCCTGGCACGGACTATGTGT-3 ' (SEQ ID NO:28) 

Ag 53 (R): 5 ' -GCCGTCAGCCTTGGAAAGT-3 ' (SEQ ID NO:29) 

Ag 53 (P): TET-5 • -CCATTCCCGCTGCACTGTGACG-3 ' -TAMRA (SEQ ID NO:30) 

SEPC 7 (clone 16406477,0.206) was found to be expressed essentially exclusively in 
5 testis cells, with a low level of expression in the hypothalamus, among the cells tested. 

Clone 21433858 Primer Set: 

Ag 127 (F): 5 ' -CCTGCCAGGATGACTGTCAATT-3 • (SEQIDNO:31) 

Ag 127 (R):- 5 • -TGGTCCTAACTGCACCACAGTCT-3 ' (SEQ ID NO:32) 

Ag 127 (P): TET-5 • -CCAGCTGGTCCAAGTTTTCTTCATGCAA-3 ' -TAMRA (SEQ ID NO:33) 

10 Probe set Ag 127 targets nucleotides 2524-2601 of SECPl (clone 21433858). The results 

show that the clone is expressed principally in normal tissues such as adipose, brain, bladder, 
fetal and adult kidney, mammary gland, myometrium, uterus, placenta, and testis. In comparison 
to normal lung tissue, it is highly expressed in a small cell lung cancer, a large cell lung cancer, 
and a non-small cell lung cancer. Therefore, SECPl (clone 21433858), or a fragment thereof, 

15 may be useful as a diagnostic probe for such lung cancers. The nucleic acids or proteins of 
SECPl (clone 21433858) may furthermore serve as targets for the treatment of cancer in these 
and other tissues. 

Clone 21637262.0.64 Primer Set: 

Ab5(F): 5 • -GTGATCCTCAGGCTGGACCA-3 ' (SEQ ID NO:34) 

20 . Ab5(R): 5 • -ttctgactgggctgcatcc-3 • (SEQ ID NO:35) 

Ab5(P): FAM-5 ' -CCAGTGTTTCCTCAGCACAGGGCC-3 ' -TAMRA (SEQ ID NO:36) 

Probe set Ab5 targets nucleotides 1221-1298 in SECP9 (clone 21637262.0.64). The 
results shown in FIG. 14 demonstrate that SECP9 (clone 21637262.0.64) is expressed in cells 
from normal tissues including, especially, the salivary gland and trachea, among those cells 
25 examined. 



Table ??. Probe and Primer Set: Ag 815 for CG106318_01 



Primers Sequences TM Length Start Position ^^^'^ 

Forward 5*-TGTGCTCAGCACATGGTCTA-3 ' 59 20 1722 37 

Probe FAM-5'- g l7/;n 
ACACCTGCTCAGGGAAAACGACAGAA- 
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3 ■ -TAMRA 

Reverse 5 -TCGTGCTCGTATCTGTTTCC-3 ' 58.9 20 
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Other Embodiments 

While the invention has been described in conjunction with the detailed description 
thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, 
which is defined by the scope of the appended claims. Other aspects, advantages, and 
modifications are within the scope of the following claims. 
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